Module 4 · AI that does things 6 min read

Oversight: keeping agents on a leash

How to let an AI agent help without letting it cause real damage. The pilot-and-autopilot rule.

The pilot-and-autopilot rule

Modern planes fly themselves most of the time. But there’s still a pilot, awake, watching the dashboard. Why? Because the autopilot is great at the boring 99% and terrible at the surprising 1%. The pilot is there for the 1%.

Agents are the same idea. They’re great at boring, repetitive steps. They’re terrible at “wait, something weird just happened.” That’s where you come in.

Three levels of human oversight

The AI safety world has settled on a clear menu:

Human in the loop. A person approves each risky step before the agent does it. (“About to send this email, OK?”) Slow but safest.
Human on the loop. The agent runs by itself. A person watches and can stop it. Like a pilot monitoring autopilot.
Human out of the loop. Full autonomy. The agent decides and acts without anyone watching. Only safe for tiny, easily reversible jobs.

The right level depends on the blast radius of a mistake.

Try it yourself

Move the dial. Watch the right oversight mode change, and see what kinds of actions live at each danger level.

Match the oversight to the danger

Move the dial. The more an action could damage if it goes wrong, the more a human should be in the loop.

Blast radius: 30Human on the loop

Tiny & reversibleReal but recoverableHard to undo

Human on the loop

The agent acts in real time. A human watches and can pause or override. Like a pilot monitoring autopilot.

Examples near this level

30
Summarize 50 customer support tickets
Mostly reversible. Bad summaries waste time, not money.
on the loop
22
Tag photos in a private album
Reversible if needed.
out of the loop
15
Draft an email (you review before sending)
Reversible. You're the last gate.
out of the loop

Rule of thumb: the harder it is to undo, the more a human should approve before it happens.

Rule of thumb: the harder it is to undo, the more a human should approve before it happens.

Why this is a big deal in 2026

Governments noticed. In August 2026, a new European law starts to require that any “high-risk” AI system, things like hiring tools, medical AI, or self-driving cars, must be built so a real person can step in. Similar laws are coming in U.S. states. Translation: in many places, “the AI did it” is no longer a legal defense.

Security researchers also published their first top-10 list of agent-specific dangers in 2026, things like prompt injection through tools, runaway loops, and agents leaking secrets. Same idea as the famous web-security top-10, but for agents.

What good oversight looks like (in plain words)

Set a budget. Cap the number of steps, the time, or the money the agent can spend.
Make risky steps ask first. “I’m about to delete 12 files. Approve?” Don’t accept a vague “yes/no” prompt: list what will happen.
Log everything. If something breaks, you need to see exactly what the agent did and why.
Give it the smallest possible permissions. Read-only access first. Add powers only when you trust the workflow.
Practice the abort button. Know how to stop the agent before you let it run live.

The takeaway

A useful agent is like a very fast intern. Smart, eager, sometimes wrong, occasionally about to do something they shouldn’t. The job of oversight is to keep the eager part and catch the wrong part before it costs anything real.

Quick check

1. What does 'human in the loop' mean?
2. When should you require human approval before the agent acts?
3. Which of these is part of good agent oversight?
4. Why is 'effective human oversight' a legal requirement in some places in 2026?

Where we go next

You now understand what AI is, how LLMs work, where they fail, and how agents take action. One module left: the newest 2026 ideas, AI that sees and hears, AI that looks things up, and AI that pauses to think before it answers.