The Small Model Bet: Bigger Isn't Always Better

- 5 min

Why tiny, purpose-built AI models will beat today's bloated, everything-to-everyone computational monsters.


I just read Utkarsh Kanwat's piece on "Betting Against Agents" and found myself nodding along with that familiar mix of vindication and frustration that comes with watching an entire industry chase shiny objects while ignoring basic math.

His core argument is solid: today's AI agents are computationally wasteful, context-hungry beasts that burn through tokens. The economics don't add up when you're feeding massive prompts to massive models for every tiny decision in a multi-step workflow. But I think he's only scratching the surface of why this whole agent craze feels so fundamentally backwards to me.

The Fine-Tuning Amnesia

Remember when GPT-4 dropped back in 2023? We were all wrestling with those tiny 8k context windows, and I spent a lot of time and energy dealing with chunking strategies and vector embeddings. Every day was a new experiment in RAG architectures, trying to cram relevant information into those precious few tokens.

But the whole time, one thought kept coming back. Most of the time, I’m doing just one thing. If I’m coding, I don’t need a model that also writes poetry or translates languages. And if I’m translating text, why carry the full weight of human knowledge just to do that?

The conclusion I reached back then feels even more relevant now. The future isn't these monolithic, everything-to-everyone models. It's tiny, local, purpose-built systems that excel at one thing without the computational overhead of being good at everything else.

Everyone was talking about fine-tuning in early 2023. There were startups popping up left and right, each promising to help you train specialized models for your specific use case. Then somewhere along the way, we all got distracted by the promise of AGI, dazzled by million token context windows, and quietly moved on from this entirely sensible approach.

And to be fair, it does make sense. As of this writing, Anthropic’s latest model (Claude-4) doesn’t even offer fine-tuning. Why would it? Fine-tuning is hard, costly, and slow. With massive context windows, “just throw everything in” isn’t just easier—it’s often the best solution.

The Dictionary Problem

Kanwat's math is compelling, but there's an even simpler way to think about why current this all feels so wrong. It's like sending an entire dictionary and thesaurus to Google Docs every time you want to use spell-check.

When I watch these agent frameworks in action, I see the same pattern over and over: massive context windows stuffed with instructions, examples, and background information that gets re-processed for every single decision in a chain.

It's the "if you only have a hammer, everything looks like a nail" problem, except our hammer costs $5 per swing takes 30 seconds to swing it, and the cost doubles every swing.

Instead of feeding tons of context to these general-purpose models, why not train smaller models that already know their specific domain? A model fine-tuned for SQL generation doesn't need a primer on database concepts every time it runs. A customer service agent doesn't need to rediscover what your company does with each interaction.

Where I Think We're Headed

The truth is, most business problems don’t require massive, do-it-all models. There’s rarely a need to be great at everything at once. Businesses need reliability, speed, and reasonable costs. Reliability most of all.

I'm still betting on small. Not because I'm anti-progress or skeptical of AI's potential, but because I've seen this pattern before in tech. The initial breakthrough always comes as a massive, general-purpose solution. Then reality sets in, and we start optimizing for actual use cases rather than demo magic.

I think we'll see the current agent fever cool down as people realize the economics don't work. The companies that survive will be the ones that figure out how to decompose these complex workflows into efficient, specialized components.

Why This Actually Matters

There's big talk about how we're currently "at an inflection point" with these AI tools. And I get it, there have been some pretty amazing advances this past year. But we're not at the important inflection point yet, not really.

All of these companies are burning through billions of dollars. Actually losing billions. Because right now it's all about market adoption, not profitability. What's their endgame? I'm sure some smart business people have educated guesses. Wait until everyone in the world depends on these tools for basic daily tasks, then jack up the prices? Get these models integrated into every internet-connected device you own and start feeding you ads? I don't know.

But something has to give. The current economics are completely unsustainable. If prices get raised to levels that would actually make these companies profitable, using these tools becomes wholly unprofitable for everyone else. We'd price out exactly the use cases that make them valuable in the first place.

What's the solution? Small, locally hosted, domain-specific models. Models that don't require massive cloud infrastructure or venture capital life support. Models that actually work for businesses that need to control their costs and their data.

Key Takeaway

If there's one takeaway from Utkarsh's article, it's the risk of compounding errors. In most businesses, even 90 percent accuracy isn't good enough, let alone 60.

Success comes from balance. I'm not betting against agents like Utkarsh is. I'm betting the future will be a combination of smaller, fine-tuned models, human oversight, and deterministic steps.

The Hope in the Hype

Despite my skepticism about how things are currently being built, I'm genuinely excited about where this is all going.

Ultimately, cost and reliability are the foundations that all these generative AI tools, agents or otherwise, depend on. Once we figure out how to build purpose-built, fine-tuned systems that are both dependable and efficient, the results are going to be incredible.

Imagine models that cost pennies instead of dollars, and respond in milliseconds instead of seconds. That's the future I want to build toward.

The math will win eventually. It always does.