🛡️
Guardrails & Safety
Mechanisms that keep agents reliable, aligned, and safe — from input validation to human oversight.
3 patterns
Human-in-the-Loop
Basic
Require human approval at critical decision points, balancing agent autonomy with human oversight.
Input/Output Guardrails
Basic
Validate and filter both inputs to and outputs from the LLM to prevent misuse, ensure quality, and block harmful content.
Constitutional AI
Advanced
Guide agent behavior through an explicit set of principles (a 'constitution') that the agent self-enforces through critique and revision.