Multi-Agent Debate
Intent
Multiple agents argue opposing positions to stress-test ideas, expose flaws in reasoning, and arrive at better conclusions.
Problem
A single agent tends to be confidently wrong — it commits to its first reasoning path and doesn't question its own assumptions. Confirmation bias is as much a problem for LLMs as it is for humans.
Solution
Create multiple agents that take different positions on a question and engage in structured debate. Each agent presents arguments, critiques the other's reasoning, and refines its position. A judge agent (or the debate structure itself) determines the winner or synthesizes the best elements of each position. This adversarial process forces arguments to withstand scrutiny, producing more robust conclusions.
Diagram
Question → [Agent A: Position 1] [Agent B: Position 2]
↓ ↓
[A critiques B] [B critiques A]
↓ ↓
[A refines position] [B refines position]
↓ ↓
[Judge: Synthesize best arguments]
↓
Final AnswerWhen to Use
- Complex decisions with multiple valid perspectives
- Risk assessment where overlooking flaws is costly
- Reducing hallucination through adversarial scrutiny
- Problems where the 'best' answer isn't obvious
When NOT to Use
- Factual questions with clear, unambiguous answers
- Latency-sensitive applications
- Simple tasks where debate adds no value
Pros & Cons
Pros
- Exposes flawed reasoning through adversarial pressure
- Reduces hallucination and overconfidence
- Produces more nuanced, well-considered outputs
- Explores problem space more thoroughly
Cons
- High token cost (multiple agents, multiple rounds)
- Agents may agree on wrong answers (shared biases)
- Debate quality depends on prompt design
- Can produce analysis paralysis on simple questions
Implementation Steps
- 1Define the debate structure: number of agents, number of rounds
- 2Create distinct agent personas with different perspectives
- 3Implement the debate protocol: argue, critique, refine
- 4Build the judge that evaluates arguments and synthesizes
- 5Set termination criteria (max rounds, consensus, judge decision)
- 6Log all debate rounds for transparency
Real-World Example
Architecture Decision Record
Should we use microservices or a monolith? Agent A argues for microservices (scalability, team independence). Agent B argues for monolith (simplicity, faster development). After 3 rounds of debate, the judge synthesizes: 'Start with a modular monolith, extract services as scaling needs emerge.'
from openai import OpenAI
client = OpenAI()
def debate(question: str, rounds: int = 2) -> str:
agents = [
{"role": "Agent A", "stance": "Argue FOR"},
{"role": "Agent B", "stance": "Argue AGAINST"},
]
history = []
for _ in range(rounds):
for agent in agents:
opponent_args = [h["argument"] for h in history if h["agent"] != agent["role"]]
context = f"\nOpponent argued: {opponent_args[-1]}" if opponent_args else ""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"You are {agent['role']}. {agent['stance']}."},
{"role": "user", "content": f"{question}{context}"},
],
)
history.append({"agent": agent["role"], "argument": response.choices[0].message.content})
debate_log = "\n\n".join(f"{h['agent']}: {h['argument']}" for h in history)
verdict = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Synthesize the best answer from this debate:\n\n{debate_log}"}],
)
return verdict.choices[0].message.content