Consensus Voting
Intent
Multiple agents independently solve the same problem, then vote to determine the best answer.
Problem
Any single agent can produce a wrong answer. Relying on one output for critical decisions is risky. You need a way to increase confidence without simply retrying the same approach.
Solution
Deploy multiple agents (potentially with different models, prompts, or temperatures) to independently solve the same problem. Collect all outputs and apply a voting mechanism: majority vote for discrete answers, or a scoring/ranking system for complex outputs. The consensus answer is more reliable than any individual response. This extends Self-Consistency from a single-model technique to a multi-agent architecture.
Diagram
┌→ [Agent 1 (GPT-4)] → Answer A
│
Task → [Distribute] → [Agent 2 (Claude)] → Answer A
│
├→ [Agent 3 (Gemini)] → Answer B
│
└→ [Agent 4 (GPT-4, different prompt)] → Answer A
Vote: A wins (3:1) → Final Answer: AWhen to Use
- Critical decisions where accuracy is paramount
- When you have access to multiple models or configurations
- Quality assurance for high-stakes outputs
- Reducing model-specific biases through diversity
When NOT to Use
- Low-stakes tasks where one agent is sufficient
- When cost is more important than accuracy
- Creative tasks where diversity of output is desired
Pros & Cons
Pros
- Higher accuracy than any single agent
- Reduces model-specific biases
- Vote margin provides a confidence signal
- Simple aggregation logic
Cons
- N× cost increase
- Only works for tasks with definitive answers
- All agents might share the same blind spots
- Latency limited by the slowest agent
Implementation Steps
- 1Select 3-5 diverse agents (different models, prompts, or configurations)
- 2Send the same task to all agents in parallel
- 3Collect responses and extract comparable answers
- 4Apply voting: majority vote, weighted vote, or ranked choice
- 5Use vote margin as confidence: unanimous = high, split = low
- 6For low-confidence results, escalate to human review
Real-World Example
Medical Document Classification
A clinical document needs to be classified by urgency. Three agents with different specializations independently classify it. Two say 'urgent,' one says 'routine.' Majority vote: 'urgent.' The split vote triggers a flag for human review.
from openai import OpenAI
from collections import Counter
client = OpenAI()
def consensus_vote(question: str, n_agents: int = 5) -> dict:
raw_answers = []
for _ in range(n_agents):
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.7,
messages=[{"role": "user", "content": f"{question}\n\nGive a concise final answer after ANSWER:"}],
)
raw_answers.append(response.choices[0].message.content)
answers = []
for text in raw_answers:
if "ANSWER:" in text:
answers.append(text.split("ANSWER:")[-1].strip())
votes = Counter(answers)
winner, count = votes.most_common(1)[0]
return {"answer": winner, "confidence": count / n_agents, "votes": dict(votes)}