ptrnsai

Unsandboxed Execution

Advanced🚫 Anti-Patternβš™οΈ Anti-Patterns: Tool UseMicrosoft Research / Industry observation
🚫Anti-Patternβ€” This describes a common mistake to avoid, not a pattern to follow.

The Anti-Pattern

Giving agents direct access to production systems, real databases, or real-world actions without sandboxing, rollback, or approval gates.

Why It Happens

During development, it’s tempting to give agents real credentials for faster iteration. Agents then execute destructive operations β€” deleting data, sending emails, making purchases, deploying code β€” with no way to undo. The failure mode isn’t β€˜agent doesn’t work,’ it’s β€˜agent works too well on the wrong thing.’ By the time you realize what happened, the damage is done.

How to Fix It

Always sandbox agent actions in development and testing. Use human-in-the-loop approval for high-risk operations. Implement dry-run mode for every destructive action. Classify all actions by risk level and require escalating approval thresholds. Apply the principle of least privilege β€” agents get only the permissions they need, nothing more. Design for the worst case: what’s the most damage this agent could do? Then make that impossible.

Diagram

  Unsandboxed (dangerous):            Sandboxed (safe):
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Agent │───▢│ Production β”‚        β”‚ Agent │───▢│ Sandbox β”‚
  β”‚       β”‚    β”‚ Database   β”‚        β”‚       β”‚    β”‚  (copy) β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚            β”‚                     β”‚            β”‚
  DROP TABLE users  β”‚                  DROP TABLE users β”‚
       β”‚            β–Ό                     β”‚            β–Ό
       β”‚     πŸ’€ Data gone forever         β”‚     βœ“ Only sandbox affected
       β”‚                                  β”‚
       β”‚                                  β–Ό
       β”‚                            [Human review]
       β”‚                                  β”‚
       β”‚                            'Looks wrong, reject'

Symptoms

  • Agents have write access to production databases or APIs
  • No approval workflow exists for destructive operations
  • Testing happens directly against production systems
  • Agent errors result in real-world consequences that can’t be reversed

False Positives

  • Agents operating in read-only mode with no write capabilities
  • Well-tested agents with comprehensive approval workflows already in place
  • Development environments that are already properly isolated

Warning Signs & Consequences

Warning Signs

  • Agent performing write operations on production without any review step
  • No separation between dev/staging/production for agent access
  • Credentials with broad permissions given directly to agents
  • Lack of audit trail for agent-initiated actions

Consequences

  • Irreversible damage from agent errors β€” data loss, wrong emails sent, bad deployments
  • Compliance and regulatory violations from uncontrolled access
  • Impossible debugging when the agent has modified the system it’s operating on
  • Complete loss of user trust after a catastrophic agent-initiated incident

Remediation Steps

  1. 1Classify all agent actions by risk level: read-only, low-risk write, high-risk write
  2. 2Sandbox all development and testing β€” never test against production
  3. 3Implement dry-run mode that shows what would happen without executing
  4. 4Add human-in-the-loop approval for all high-risk operations
  5. 5Apply principle of least privilege β€” minimal permissions, explicit scope

Real-World Example

The Accidental Email Blast

A customer service agent is given access to the email API for sending individual support responses. Due to a prompt injection in a customer ticket, the agent sends the same response to every customer in the database β€” 50,000 emails in 3 minutes. A human-in-the-loop approval for batch operations (>1 recipient) would have caught this instantly.

References