Tier 3 · Guard & build3.512 min

Which model tier runs your agent

A fiery orange-and-pink sunset over a harbour ringed by dark hills

The last lesson asked one question about your model — whose computer is it running on? (custody). This one asks the other half: which tier of model should run this agent? When you’re running one agent it barely matters. When you’re running several, or one of them daily, it becomes a real decision — about cost and about quality — and it’s Anchor 2, continuous improvement, in a very concrete form: put the capability where it earns its keep, and not a dollar more.

The two instincts that both fail

“Always use the best.” Comfortable, expensive, and it teaches you nothing about where the money is doing work. “Vibes” — this agent feels important, so it gets the top model. But how important an agent feels correlates badly with the specific kind of difficulty a stronger model actually addresses. Most agents aren’t limited by the model’s capability; they’re limited by a vague job or messy inputs, and a bigger model fixes neither.

The disciplined answer is the same triage the whole course has taught, pointed at your gallery: score each agent on the characteristics that genuinely reward a stronger model, and pay the top tier only where they’re present.

Where the top tier earns its keep — across your gallery

Reasoning leverage — long chains where an early error compounds silently. A Bookkeeper reconciliation-checker running one written rule is low leverage; a cheaper tier does it well. An agent making a call that chains across many documents is high.
Synthesis leverage — reconciling sources that disagree, and adversarial reading (what a source leaves out). Your Market or Competitive Analyst, weighing conflicting reports, is where the top tier shows.
Strategic depth — where a mediocre answer isn’t wrong, just shallow, and the shallowness costs you. A cheaper model summarises; a stronger one notices the framing that changes your decision.

And the two things a bigger model will not fix — which the earlier tiers already taught you:

Making things up is held down by grounding — sources, criteria, a human check (Tier 3’s guardrails), not by the price of the model.
Bias does not reliably shrink with tier. The Recruiter’s lesson holds: you don’t buy your way out of a fairness problem with a more expensive model — you scope it, test it, and sometimes decline it.

Then two plain checks: readiness — a capability-hungry agent handed a vague job produces expensive confusion, not brilliance — and volume — tier pricing barely matters for a once-a-week agent and compounds for one that runs all day.

The move that keeps it model-agnostic

Here’s what makes this sit cleanly beside the last lesson rather than fighting it: the framework doesn’t care whose model it is. It tells you where capability earns its keep — and that’s just as true of the sovereign, New-Zealand- or EU-hosted models from Lesson 3.4 as of any public frontier tier. So the two questions compose into one grid:

Whose computer (custody) — decided by what the agent touches.
Which tier (capability) — decided by whether the agent’s work rewards a stronger model.

A sensitive-data agent belongs on sovereign infrastructure whatever its tier; a capability-hungry agent on non-sensitive work can reach for the strongest tier available. You allocate on both axes, deliberately, rather than defaulting the whole fleet to the most expensive box.

Naming the tier — carefully

At the time of writing, the most capable widely-released model is Claude Fable 5, above the Opus, Sonnet, and Haiku tiers — but that is exactly the kind of fact that dates: names, capabilities, and prices change often, and the sovereign options move too. The durable claim is simply that a higher tier is more capable than the ones below it. For current specifics, check the source (anthropic.com/news, docs.claude.com) rather than trusting a course page from memory — the same evidence discipline you’d demand of the agent itself. (This course’s legislative-watch keeps an eye on when these facts shift.)

The build move

Triage each agent on reasoning, synthesis, and strategic depth. Two of three high → a candidate for the top tier. Otherwise a cheaper tier, honestly.
Don’t let hallucination-exposure or bias push the tier up — those are grounding and scope jobs.
Then run the cheap experiment: for a candidate, run it once on your current tier and once on the top, and compare the outputs yourself. The triage says where the experiment is worth it; the experiment tells the truth.

Take the agents in your gallery. Which one would you actually pay the top tier for — and can you name which of reasoning, synthesis, or strategic depth justifies it? If the honest answer is “it just feels important,” that’s the instinct this lesson exists to check.

That’s the guard-and-build tier done: scope, criteria, guardrails, testing, the two builds, and the two questions about your model — whose computer, and which tier. Tier 4 puts the agent to work and keeps you answerable for it.

Shared freely, in good faith. If it's been of value, a koha toward development and running costs is warmly welcomed.

Leave a koha →

← 3.4 4.1 →