You Deserve The Cluster

Apr 6

The case for running your historical collections through research computing, and why it costs less than you think.

6 Comments

Great post. The only thing I would add is we can do a lot with older generation A100 GPUs and these are often idle as the demand is focused on newer H100s for frontier work. I’m able to run large OCR and NER jobs most days without waiting.

Reply (1)

Thiago

Apr 8

That’s great to hear! Which open source models do you favor, Jim? I have to try this!

Reply (1)

Jim Clifford

Apr 8

I’ve been using olmOCR for months to re-OCR PDFs at scale on the cluster. For NER, my student fine tuned a Qwen model for Early Modern texts. More recently I’ve been testing Gemma 4 26b MoE with promising results.

Thiago

Apr 8

This is awesome! I just applied to use these resources at my university. Loren, which open source models do you use? I’ve been indexing early modern manuscript records with Codex.

Comment deleted

Comment deleted

I'm going to write something this week about using ollama models. The key driver for me has been experimenting. You can easily run experiments over various models and configurations. Something that works with one thing won't work with another. And, omg, change the default context window. Make it so small. You get 10x speed .

Thiago

Apr 8

Thanks! Yeah, Gemini 3 is very good, although I dislike Google's Antigravity. I'm way behind, as I have zero technical skills, but I'm always interested in learning more!

Computational History

You Deserve The Cluster