Multi-Agent Debate

Advanced👥 Multi-Agent PatternsDu et al., 2023

Intent

Multiple agents argue opposing positions to stress-test ideas, expose flaws in reasoning, and arrive at better conclusions.

Problem

A single agent tends to be confidently wrong — it commits to its first reasoning path and doesn't question its own assumptions. Confirmation bias is as much a problem for LLMs as it is for humans.

Solution

Create multiple agents that take different positions on a question and engage in structured debate. Each agent presents arguments, critiques the other's reasoning, and refines its position. A judge agent (or the debate structure itself) determines the winner or synthesizes the best elements of each position. This adversarial process forces arguments to withstand scrutiny, producing more robust conclusions.

Diagram

Question → [Agent A: Position 1]  [Agent B: Position 2]
                      ↓                      ↓
              [A critiques B]        [B critiques A]
                      ↓                      ↓
              [A refines position]   [B refines position]
                      ↓                      ↓
                    [Judge: Synthesize best arguments]
                               ↓
                         Final Answer

When to Use

Complex decisions with multiple valid perspectives
Risk assessment where overlooking flaws is costly
Reducing hallucination through adversarial scrutiny
Problems where the 'best' answer isn't obvious

When NOT to Use

Factual questions with clear, unambiguous answers
Latency-sensitive applications
Simple tasks where debate adds no value

Pros & Cons

Pros

Exposes flawed reasoning through adversarial pressure
Reduces hallucination and overconfidence
Produces more nuanced, well-considered outputs
Explores problem space more thoroughly

Cons

High token cost (multiple agents, multiple rounds)
Agents may agree on wrong answers (shared biases)
Debate quality depends on prompt design
Can produce analysis paralysis on simple questions

Implementation Steps

1Define the debate structure: number of agents, number of rounds
2Create distinct agent personas with different perspectives
3Implement the debate protocol: argue, critique, refine
4Build the judge that evaluates arguments and synthesizes
5Set termination criteria (max rounds, consensus, judge decision)
6Log all debate rounds for transparency

Real-World Example

Architecture Decision Record

Should we use microservices or a monolith? Agent A argues for microservices (scalability, team independence). Agent B argues for monolith (simplicity, faster development). After 3 rounds of debate, the judge synthesizes: 'Start with a modular monolith, extract services as scaling needs emerge.'

PythonTwo-Agent Debate with Judge

from openai import OpenAI

client = OpenAI()

def debate(question: str, rounds: int = 2) -> str:
    agents = [
        {"role": "Agent A", "stance": "Argue FOR"},
        {"role": "Agent B", "stance": "Argue AGAINST"},
    ]
    history = []

    for _ in range(rounds):
        for agent in agents:
            opponent_args = [h["argument"] for h in history if h["agent"] != agent["role"]]
            context = f"\nOpponent argued: {opponent_args[-1]}" if opponent_args else ""

            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[
                    {"role": "system", "content": f"You are {agent['role']}. {agent['stance']}."},
                    {"role": "user", "content": f"{question}{context}"},
                ],
            )
            history.append({"agent": agent["role"], "argument": response.choices[0].message.content})

    debate_log = "\n\n".join(f"{h['agent']}: {h['argument']}" for h in history)
    verdict = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Synthesize the best answer from this debate:\n\n{debate_log}"}],
    )
    return verdict.choices[0].message.content

References

Improving Factuality and Reasoning through Multi-Agent Debate — Du et al., 2023