Oversight: keeping agents on a leash
How to let an AI agent help without letting it cause real damage. The pilot-and-autopilot rule.
The pilot-and-autopilot rule
Modern planes fly themselves most of the time. But there’s still a pilot, awake, watching the dashboard. Why? Because the autopilot is great at the boring 99% and terrible at the surprising 1%. The pilot is there for the 1%.
Agents are the same idea. They’re great at boring, repetitive steps. They’re terrible at “wait, something weird just happened.” That’s where you come in.
Three levels of human oversight
The AI safety world has settled on a clear menu:
- Human in the loop. A person approves each risky step before the agent does it. (“About to send this email, OK?”) Slow but safest.
- Human on the loop. The agent runs by itself. A person watches and can stop it. Like a pilot monitoring autopilot.
- Human out of the loop. Full autonomy. The agent decides and acts without anyone watching. Only safe for tiny, easily reversible jobs.
The right level depends on the blast radius of a mistake.
Try it yourself
Move the dial. Watch the right oversight mode change, and see what kinds of actions live at each danger level.
Move the dial. The more an action could damage if it goes wrong, the more a human should be in the loop.
The agent acts in real time. A human watches and can pause or override. Like a pilot monitoring autopilot.
- 30on the loopSummarize 50 customer support ticketsMostly reversible. Bad summaries waste time, not money.
- 22out of the loopTag photos in a private albumReversible if needed.
- 15out of the loopDraft an email (you review before sending)Reversible. You're the last gate.
Rule of thumb: the harder it is to undo, the more a human should approve before it happens.
Rule of thumb: the harder it is to undo, the more a human should approve before it happens.
Why this is a big deal in 2026
Governments noticed. In August 2026, a new European law starts to require that any “high-risk” AI system, things like hiring tools, medical AI, or self-driving cars, must be built so a real person can step in. Similar laws are coming in U.S. states. Translation: in many places, “the AI did it” is no longer a legal defense.
Security researchers also published their first top-10 list of agent-specific dangers in 2026, things like prompt injection through tools, runaway loops, and agents leaking secrets. Same idea as the famous web-security top-10, but for agents.
What good oversight looks like (in plain words)
- Set a budget. Cap the number of steps, the time, or the money the agent can spend.
- Make risky steps ask first. “I’m about to delete 12 files. Approve?” Don’t accept a vague “yes/no” prompt: list what will happen.
- Log everything. If something breaks, you need to see exactly what the agent did and why.
- Give it the smallest possible permissions. Read-only access first. Add powers only when you trust the workflow.
- Practice the abort button. Know how to stop the agent before you let it run live.
The takeaway
A useful agent is like a very fast intern. Smart, eager, sometimes wrong, occasionally about to do something they shouldn’t. The job of oversight is to keep the eager part and catch the wrong part before it costs anything real.
Quick check
- 1. What does 'human in the loop' mean?
- 2. When should you require human approval before the agent acts?
- 3. Which of these is part of good agent oversight?
- 4. Why is 'effective human oversight' a legal requirement in some places in 2026?
Where we go next
You now understand what AI is, how LLMs work, where they fail, and how agents take action. One module left: the newest 2026 ideas, AI that sees and hears, AI that looks things up, and AI that pauses to think before it answers.