Tier 3 · Guard & build3.322 min

Build with Claude Code — two agents

A country road curving past an old woolshed under a green hillside at golden hour

Everything so far has been getting you ready to build one of these without hurting anyone — yourself included. Now we build. Two agents, and the second one is not what you’d expect: you build one to work, and one to watch fail. The failure is the lesson.

If you did Working with Claude, you met Claude Code there — the version of Claude that can read your files, run tools, and carry out a task end to end on your own machine. That’s what we’re pointing at a job. You don’t need to be a programmer; you need to be able to describe a job clearly and hold the discipline from the last five lessons. This is Anchor 1 — learn by doing it once, small.

Build A — a good agent, end to end (the transferable pattern)

We build a reconciliation-checker: an agent that reads a batch of invoices and a bank statement, and flags the ones that don’t match. It’s deliberately chosen — run it past the Tier 1 triage and it lands cleanly in the agent bucket: a rule you can write down, easily reversible, a mistake that lands on no one much, and your own numbers rather than someone else’s data. That’s not a coincidence; it’s how you should pick your first real build.

Build it in the order the course taught, because that order is the transferable pattern:

Scope the blast radius (2.1). Read-only. It sees one folder of invoices and one statement file. It cannot write, send, pay, or reach anything else. The worst it can do is mis-flag a row, and you’ll catch that.
Write the criteria, not vibes (2.3). The matching rule, explicitly: what counts as a match, what tolerance, what’s an exception. The agent returns the mismatches it found and why — evidence — not “the books look fine.”
Write the guardrails (3.1). It flags what it’s unsure of rather than guessing. It never guesses a figure. It stops at anything that would leave the machine — there’s nothing to send here, and that’s the point of a first build.
Test it (3.2). Spot-check a sample against the source. Feed it a deliberately broken statement and confirm it catches the break. Trust it because it passed, not because you made it.

That’s a complete, useful, safe agent — and the shape of it transfers straight to the next one you build. (If your bottleneck is the inbox rather than the books, the same four steps build an inbox-triage assistant that sorts and drafts but never sends — just note it now touches other people’s data, so the data question from Tier 1 comes first.)

Build B — the Recruiter, built to watch it fail

Now the uncomfortable one. We build the redact-then-score recruiter — the agent the comfortable advice says is fine: strip the names, let it score, keep a human at the end. We build it specifically so you can watch that advice break in your own hands.

Build the redact-then-score agent. It takes applications, redacts the obvious identity fields — name, age, photo, address — and scores the rest against criteria. Exactly the “safe AI screening” a vendor would sell you.
Run the name-swap probe on it (3.2). Take one application. Score it. Now change only the name — nothing else — and score it again. Then do it across a batch: swap genders, swap obviously-different-origin names, hold everything else constant.
Watch what happens. The score moves. It moves because the model reconstructs identity from the proxies you didn’t redact — the school, the address, the gaps, the phrasing — exactly as Tier 2 warned. You redacted the name and the bias walked in through the postcode.
Now sit with the real lesson. You did everything the comfortable version told you to. You redacted. You kept a human at the end. And your own test just showed the thing is skewing on identity you thought you’d removed. This is where an agent-builder earns their keep: you learn to say no. For a high-stakes decision about a person, where the fixes leak and the test comes back skewed, the disciplined answer is a narrow, non-ranking, human-decided process — or not automating the decision at all.

The point of Build B was never a working CV-ranker. It’s the judgment to pull back — the most valuable thing this course can hand you, and the one you’ll only believe once you’ve watched it fail on your own screen.

What you’ve actually built

Two patterns you can reuse across the gallery: the safe build (scope → criteria → guardrails → test) for work that belongs to an agent, and the honest stop for work that doesn’t, however capable the tool gets. Both are the same discipline pointed in opposite directions — and both are yours to answer for.

Which build would you reach for first — and be honest about why? If it’s the recruiter because it’d save the most time, that’s exactly the pull this lesson exists to check.

You can build one on a public tool. The last lesson of this tier asks a sharper question: whose computer is it running on, and whose laws does the data fall under? The sovereign option.

Shared freely, in good faith. If it's been of value, a koha toward development and running costs is warmly welcomed.

Leave a koha →

← 3.2 3.4 →

Build with Claude Code — two agents

Build A — a good agent, end to end (the transferable pattern)

Build B — the Recruiter, built to watch it fail

What you’ve actually built

Next