ptrnsai

Multi-Agent Debate

Advanced👥 Multi-Agent PatternsDu et al., 2023

Intent

Multiple agents argue opposing positions to stress-test ideas, expose flaws in reasoning, and arrive at better conclusions.

Problem

A single agent tends to be confidently wrong — it commits to its first reasoning path and doesn't question its own assumptions. Confirmation bias is as much a problem for LLMs as it is for humans.

Solution

Create multiple agents that take different positions on a question and engage in structured debate. Each agent presents arguments, critiques the other's reasoning, and refines its position. A judge agent (or the debate structure itself) determines the winner or synthesizes the best elements of each position. This adversarial process forces arguments to withstand scrutiny, producing more robust conclusions.

Diagram

Question → [Agent A: Position 1]  [Agent B: Position 2]
                      ↓                      ↓
              [A critiques B]        [B critiques A]
                      ↓                      ↓
              [A refines position]   [B refines position]
                      ↓                      ↓
                    [Judge: Synthesize best arguments]
                               ↓
                         Final Answer

When to Use

  • Complex decisions with multiple valid perspectives
  • Risk assessment where overlooking flaws is costly
  • Reducing hallucination through adversarial scrutiny
  • Problems where the 'best' answer isn't obvious

When NOT to Use

  • Factual questions with clear, unambiguous answers
  • Latency-sensitive applications
  • Simple tasks where debate adds no value

Pros & Cons

Pros

  • Exposes flawed reasoning through adversarial pressure
  • Reduces hallucination and overconfidence
  • Produces more nuanced, well-considered outputs
  • Explores problem space more thoroughly

Cons

  • High token cost (multiple agents, multiple rounds)
  • Agents may agree on wrong answers (shared biases)
  • Debate quality depends on prompt design
  • Can produce analysis paralysis on simple questions

Implementation Steps

  1. 1Define the debate structure: number of agents, number of rounds
  2. 2Create distinct agent personas with different perspectives
  3. 3Implement the debate protocol: argue, critique, refine
  4. 4Build the judge that evaluates arguments and synthesizes
  5. 5Set termination criteria (max rounds, consensus, judge decision)
  6. 6Log all debate rounds for transparency

Real-World Example

Architecture Decision Record

Should we use microservices or a monolith? Agent A argues for microservices (scalability, team independence). Agent B argues for monolith (simplicity, faster development). After 3 rounds of debate, the judge synthesizes: 'Start with a modular monolith, extract services as scaling needs emerge.'

PythonTwo-Agent Debate with Judge
from openai import OpenAI

client = OpenAI()

def debate(question: str, rounds: int = 2) -> str:
    agents = [
        {"role": "Agent A", "stance": "Argue FOR"},
        {"role": "Agent B", "stance": "Argue AGAINST"},
    ]
    history = []

    for _ in range(rounds):
        for agent in agents:
            opponent_args = [h["argument"] for h in history if h["agent"] != agent["role"]]
            context = f"\nOpponent argued: {opponent_args[-1]}" if opponent_args else ""

            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[
                    {"role": "system", "content": f"You are {agent['role']}. {agent['stance']}."},
                    {"role": "user", "content": f"{question}{context}"},
                ],
            )
            history.append({"agent": agent["role"], "argument": response.choices[0].message.content})

    debate_log = "\n\n".join(f"{h['agent']}: {h['argument']}" for h in history)
    verdict = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Synthesize the best answer from this debate:\n\n{debate_log}"}],
    )
    return verdict.choices[0].message.content

References