ptrnsai

Human-in-the-Loop

Basic🛡️ Guardrails & SafetyIndustry practice

Intent

Require human approval at critical decision points, balancing agent autonomy with human oversight.

Problem

Fully autonomous agents can make irreversible mistakes — deleting data, sending emails, making purchases. The cost of an error may far exceed the value of the automation. You need to maintain human control over high-stakes actions while still getting the efficiency benefits of automation.

Solution

Insert approval gates at critical points in the agent's workflow. The agent operates autonomously for low-risk actions but pauses and requests human approval before high-risk ones. The threshold for what requires approval can be adjusted based on trust and risk tolerance. This creates a spectrum from fully manual (everything needs approval) to fully autonomous (nothing does), with most production systems somewhere in between.

Diagram

Agent executing task...
    ├── [Read file] → Low risk → Auto-approve ✓
    ├── [Search web] → Low risk → Auto-approve ✓
    ├── [Delete database records] → HIGH RISK
    │       ↓
    │   [Pause: Request human approval]
    │       ↓
    │   Human: Approve / Reject / Modify
    │       ↓
    └── [Continue or abort based on decision]

When to Use

  • Any production system with irreversible or high-stakes actions
  • Early deployment when you're building trust in the agent
  • Regulated industries with compliance requirements
  • Actions involving financial transactions, communications, or data modification

When NOT to Use

  • Fully offline batch processing where human review adds no value
  • Time-critical systems where human approval would cause unacceptable delay
  • When the agent has been thoroughly validated for the specific task

Pros & Cons

Pros

  • Prevents catastrophic autonomous errors
  • Builds trust incrementally
  • Maintains accountability — human signs off on critical actions
  • Flexible: adjust the approval threshold as trust grows

Cons

  • Adds latency for human response
  • Doesn't scale well if too many actions need approval
  • Humans may rubber-stamp approvals (approval fatigue)
  • Requires building an approval UI and notification system

Implementation Steps

  1. 1Classify all agent actions by risk level (low, medium, high, critical)
  2. 2Define approval requirements for each level
  3. 3Build the pause/resume mechanism in the agent's execution loop
  4. 4Create the approval interface (Slack notification, web UI, email)
  5. 5Include enough context in approval requests for informed decisions
  6. 6Log all approvals and rejections for audit purposes
  7. 7Monitor approval rates: too many approvals may signal too-strict thresholds

Real-World Example

Customer Refund Agent

Agent handles refund requests autonomously for amounts under $50 (low risk). Refunds $50-500 require supervisor approval. Refunds over $500 require manager approval. The agent prepares the refund with all context, presents it for approval, and executes upon sign-off.

PythonRisk-Based Approval Gate
from enum import Enum

class Risk(Enum):
    LOW = 1
    MEDIUM = 2
    HIGH = 3

RISK_KEYWORDS = {"delete": Risk.HIGH, "transfer": Risk.HIGH, "send": Risk.MEDIUM}

def assess_risk(action: str) -> Risk:
    for keyword, level in RISK_KEYWORDS.items():
        if keyword in action.lower():
            return level
    return Risk.LOW

def execute_with_approval(action: str, auto_approve_up_to: Risk = Risk.LOW) -> str:
    risk = assess_risk(action)

    if risk.value > auto_approve_up_to.value:
        print(f"[{risk.name} RISK] Action: {action}")
        if input("Approve? (yes/no): ").strip().lower() != "yes":
            return "Rejected by human"

    return f"Executed: {action}"

# Low-risk: auto-approved. High-risk: requires human approval.
execute_with_approval("read user profile")
execute_with_approval("delete all user data")

References