🛡️

Guardrails & Safety

Mechanisms that keep agents reliable, aligned, and safe — from input validation to human oversight.

3 patterns

Require human approval at critical decision points, balancing agent autonomy with human oversight.

Validate and filter both inputs to and outputs from the LLM to prevent misuse, ensure quality, and block harmful content.

Guide agent behavior through an explicit set of principles (a 'constitution') that the agent self-enforces through critique and revision.