Tier 4 · Deploy & answer4.115 min

Deploy and oversee — a green tick isn’t success

A country road curving past an old woolshed under a green hillside at golden hourAgents at Work — CC BY 4.0

The whole promise of an agent is that it runs while you’re not watching — overnight, or while you’re with a customer. That’s also the whole danger, and the two are the same fact. This lesson is about the discipline of running an agent unattended without letting “unattended” quietly become “unaccountable.”

The trap of the green tick

An agent finishes its run and reports success. The job says “done.” Everything’s green. Here’s the thing to burn in: a green tick means the agent completed the steps it was told to. It does not mean it did the right thing.

A reconciliation-checker can run cleanly and flag the wrong invoices. A triage agent can sort a full inbox and quietly misfile the one urgent message. A screening agent can score every application without error and skew hard against one group — no error, no crash, green tick, real harm. The agent can only tell you it did what it did. It cannot tell you that what it did was correct, fair, or wise — that’s your judgment, and it doesn’t disappear because the run succeeded.

So the first rule of oversight: never read “completed” as “correct.” Completion is a claim about the process. Correctness is a claim about the world, and only a person checking against the world can make it.

Oversight you build in, not bolt on

You can’t stand over an agent that runs at 2am. So oversight has to be built into how it runs — three plain habits:

Start narrow, widen on evidence

This is Anchor 2, continuous improvement, as a deployment rule. Don’t hand a new agent the full job on day one and walk away. Run it on a slice, watch it, read its trail, check its output. Widen its role as it earns the trust — more volume, more autonomy, less checking — on evidence that it behaves, not on the fact that it hasn’t obviously broken yet. The businesses that get burned are the ones that mistook “it ran without complaint for a week” for “it’s safe to stop looking.”

The oversight move

Before an agent runs unattended on real work:

Picture your agent running overnight. Something goes wrong at 3am — not a crash, a wrong call. When you sit down in the morning, how would you even know? If the honest answer is “I might not,” that’s the gap to close before you deploy, not after.

Next

You’ve deployed it and you’re watching it. Now the part people most want to skip and least can afford to: the law you’re actually operating under.

Marking this lesson complete saves your progress on this device — no account, no tracking.

Shared freely, in good faith. If it's been of value, a koha toward development and running costs is warmly welcomed.

Leave a koha →

Useful? Share this lesson with a colleague.