Criteria, not vibes
Agents at Work — CC BY 4.0
Ask an agent “is this a good candidate?” or “is this supplier reliable?” and it will give you a confident answer. That’s the problem. A fluent verdict with nothing underneath it is the easiest thing in the world for these tools to produce, and the hardest thing to argue with — which is exactly why you should never let an agent hand you one.
The design rule for any agent that judges: it returns the evidence and the reasoning, against criteria you wrote down first. It does not return a verdict.
Vibes don’t travel
A verdict — a score, a “yes”, a ranking — is a black box. You can’t check it, you can’t explain it to the person it affects, and you can’t tell whether it came from something that matters or something that shouldn’t (the applicant’s name, the supplier’s postcode, a stray word in a CV). When it’s wrong, you find out too late, and you’re the one answering for it.
Evidence against criteria is the opposite. If the agent’s job is “flag overdue invoices over 60 days,” the criterion is written, the output is checkable in a second, and a mistake is obvious. The more the work matters, the more this matters — but it holds everywhere, across the whole gallery:
- Bookkeeper: not “does this look right?” but the reconciliation rule, and every exception it found, with the figures.
- Competitive Analyst: not “here’s what competitors are doing” but each claim with its source, so you can see whether to stand on it.
- Market Analyst: not a confident synthesis but the findings and which are cited fact versus its own inference.
- Recruiter: not a shortlist or a score but “here is where this application does and doesn’t meet the written, job-relevant criteria” — and a human decides.
Write the criteria first — and write them down
“First” is doing real work in that sentence. Criteria you settle before the agent runs are criteria you chose for good reasons. Criteria you reach for after, looking at the output, are just a story you tell to justify the answer you already have — and if the answer was biased, so is the justification.
Written-down matters too. Criteria in your head aren’t auditable, can’t be handed to the agent cleanly, and quietly drift from one case to the next. On paper, they’re a contract: this is what we’re judging on, this and not that. If a criterion is something you couldn’t say out loud to the person affected — “we mark down gaps in employment,” say — writing it down is what forces you to notice.
Why this makes the human gate real
This is the design choice that rescues the last lesson. A person handed a verdict rubber-stamps it — that’s automation bias, and it’s most of why human sign-off fails. A person handed evidence against criteria has something to actually weigh, and something to disagree with. “The agent scored her 6” invites a nod. “The agent found she meets four of the five written criteria and can’t tell on the fifth” invites a decision. Same agent, same person — but the second one is a gate and the first one is a stamp.
It’s Anchor 3 in build form: an agent you can answer for is one whose work you can see. Evidence against written criteria is what “see” means in practice.
The design move
For any agent that judges or recommends, before you build it:
- Write the criteria — explicit, job-relevant, the things that actually bear on the decision, and only those.
- Design the output as evidence + rationale — “here’s what I found, here’s how it maps to each criterion, here’s where I’m unsure” — not a score or a call.
- Keep the decision human wherever the stakes or the rights gate demand it (Tier 1).
You’ll reuse those criteria again in Tier 3, when you test the agent — because criteria you can write down are criteria you can check the agent against.
Take a judgment you’d want an agent to help with. Write the three criteria that should actually decide it. Now the honest question: is there anything currently swaying that call — a gut feel, a proxy — that you’d be uncomfortable seeing on that written list?
Next
One criterion deserves its own lesson, because it’s where the good intentions go wrong: keeping the agent blind to who the person is. It’s necessary — and, on its own, not enough.
Shared freely, in good faith. If it's been of value, a koha toward development and running costs is warmly welcomed.
Leave a koha →