Consensus Voting

Intermediate👥 Multi-Agent PatternsAcademic research / Industry practice

Intent

Multiple agents independently solve the same problem, then vote to determine the best answer.

Problem

Any single agent can produce a wrong answer. Relying on one output for critical decisions is risky. You need a way to increase confidence without simply retrying the same approach.

Solution

Deploy multiple agents (potentially with different models, prompts, or temperatures) to independently solve the same problem. Collect all outputs and apply a voting mechanism: majority vote for discrete answers, or a scoring/ranking system for complex outputs. The consensus answer is more reliable than any individual response. This extends Self-Consistency from a single-model technique to a multi-agent architecture.

Diagram

              ┌→ [Agent 1 (GPT-4)] → Answer A
              │
Task → [Distribute] → [Agent 2 (Claude)] → Answer A
              │
              ├→ [Agent 3 (Gemini)] → Answer B
              │
              └→ [Agent 4 (GPT-4, different prompt)] → Answer A

                    Vote: A wins (3:1) → Final Answer: A

When to Use

Critical decisions where accuracy is paramount
When you have access to multiple models or configurations
Quality assurance for high-stakes outputs
Reducing model-specific biases through diversity

When NOT to Use

Low-stakes tasks where one agent is sufficient
When cost is more important than accuracy
Creative tasks where diversity of output is desired

Pros & Cons

Pros

Higher accuracy than any single agent
Reduces model-specific biases
Vote margin provides a confidence signal
Simple aggregation logic

Cons

N× cost increase
Only works for tasks with definitive answers
All agents might share the same blind spots
Latency limited by the slowest agent

Implementation Steps

1Select 3-5 diverse agents (different models, prompts, or configurations)
2Send the same task to all agents in parallel
3Collect responses and extract comparable answers
4Apply voting: majority vote, weighted vote, or ranked choice
5Use vote margin as confidence: unanimous = high, split = low
6For low-confidence results, escalate to human review

Real-World Example

Medical Document Classification

A clinical document needs to be classified by urgency. Three agents with different specializations independently classify it. Two say 'urgent,' one says 'routine.' Majority vote: 'urgent.' The split vote triggers a flag for human review.

PythonMulti-Model Consensus Voting

from openai import OpenAI
from collections import Counter

client = OpenAI()

def consensus_vote(question: str, n_agents: int = 5) -> dict:
    raw_answers = []
    for _ in range(n_agents):
        response = client.chat.completions.create(
            model="gpt-4o",
            temperature=0.7,
            messages=[{"role": "user", "content": f"{question}\n\nGive a concise final answer after ANSWER:"}],
        )
        raw_answers.append(response.choices[0].message.content)

    answers = []
    for text in raw_answers:
        if "ANSWER:" in text:
            answers.append(text.split("ANSWER:")[-1].strip())

    votes = Counter(answers)
    winner, count = votes.most_common(1)[0]
    return {"answer": winner, "confidence": count / n_agents, "votes": dict(votes)}

References

Mixture of Agents — Wang et al., 2024