Human-in-the-Loop

Basic🛡️ Guardrails & SafetyIndustry practice

Intent

Require human approval at critical decision points, balancing agent autonomy with human oversight.

Problem

Fully autonomous agents can make irreversible mistakes — deleting data, sending emails, making purchases. The cost of an error may far exceed the value of the automation. You need to maintain human control over high-stakes actions while still getting the efficiency benefits of automation.

Solution

Insert approval gates at critical points in the agent's workflow. The agent operates autonomously for low-risk actions but pauses and requests human approval before high-risk ones. The threshold for what requires approval can be adjusted based on trust and risk tolerance. This creates a spectrum from fully manual (everything needs approval) to fully autonomous (nothing does), with most production systems somewhere in between.

Diagram

Agent executing task...
    ├── [Read file] → Low risk → Auto-approve ✓
    ├── [Search web] → Low risk → Auto-approve ✓
    ├── [Delete database records] → HIGH RISK
    │       ↓
    │   [Pause: Request human approval]
    │       ↓
    │   Human: Approve / Reject / Modify
    │       ↓
    └── [Continue or abort based on decision]

When to Use

Any production system with irreversible or high-stakes actions
Early deployment when you're building trust in the agent
Regulated industries with compliance requirements
Actions involving financial transactions, communications, or data modification

When NOT to Use

Fully offline batch processing where human review adds no value
Time-critical systems where human approval would cause unacceptable delay
When the agent has been thoroughly validated for the specific task

Pros & Cons

Pros

Prevents catastrophic autonomous errors
Builds trust incrementally
Maintains accountability — human signs off on critical actions
Flexible: adjust the approval threshold as trust grows

Cons

Adds latency for human response
Doesn't scale well if too many actions need approval
Humans may rubber-stamp approvals (approval fatigue)
Requires building an approval UI and notification system

Implementation Steps

1Classify all agent actions by risk level (low, medium, high, critical)
2Define approval requirements for each level
3Build the pause/resume mechanism in the agent's execution loop
4Create the approval interface (Slack notification, web UI, email)
5Include enough context in approval requests for informed decisions
6Log all approvals and rejections for audit purposes
7Monitor approval rates: too many approvals may signal too-strict thresholds

Real-World Example

Customer Refund Agent

Agent handles refund requests autonomously for amounts under $50 (low risk). Refunds $50-500 require supervisor approval. Refunds over $500 require manager approval. The agent prepares the refund with all context, presents it for approval, and executes upon sign-off.

PythonRisk-Based Approval Gate

from enum import Enum

class Risk(Enum):
    LOW = 1
    MEDIUM = 2
    HIGH = 3

RISK_KEYWORDS = {"delete": Risk.HIGH, "transfer": Risk.HIGH, "send": Risk.MEDIUM}

def assess_risk(action: str) -> Risk:
    for keyword, level in RISK_KEYWORDS.items():
        if keyword in action.lower():
            return level
    return Risk.LOW

def execute_with_approval(action: str, auto_approve_up_to: Risk = Risk.LOW) -> str:
    risk = assess_risk(action)

    if risk.value > auto_approve_up_to.value:
        print(f"[{risk.name} RISK] Action: {action}")
        if input("Approve? (yes/no): ").strip().lower() != "yes":
            return "Rejected by human"

    return f"Executed: {action}"

# Low-risk: auto-approved. High-risk: requires human approval.
execute_with_approval("read user profile")
execute_with_approval("delete all user data")

References

Building Effective Agents — Anthropic