Self-Consistency

Intermediate🧠 Reasoning PatternsGoogle Research (Wang et al., 2022)

Intent

Generate multiple independent reasoning paths for the same problem and select the most consistent answer through majority voting.

Problem

A single chain-of-thought can lead to wrong answers because the model happened to take a bad reasoning path. Different reasoning paths might lead to different answers, and you have no way to know which one is correct from a single sample.

Solution

Sample multiple reasoning chains independently (using temperature > 0) and take a majority vote on the final answers. The intuition is that correct reasoning paths tend to converge on the same answer, while incorrect paths tend to be diverse. The most common answer across many samples is most likely correct. This is essentially ensemble methods applied to LLM reasoning.

Diagram

                ┌→ [Reasoning Path 1] → Answer: 42
                │
Question → [Sample N paths] → [Reasoning Path 2] → Answer: 42
                │
                ├→ [Reasoning Path 3] → Answer: 37
                │
                └→ [Reasoning Path 4] → Answer: 42

                     Majority Vote → 42 ✓

When to Use

Mathematical reasoning where there's a single correct answer
When you need high confidence in the result
Tasks where multiple reasoning approaches exist
Critical decisions where errors are costly

When NOT to Use

Open-ended creative tasks with no single correct answer
When cost per query is a constraint (requires N× calls)
Simple tasks where the model rarely makes errors

Pros & Cons

Pros

Significantly higher accuracy than single-sample CoT
Simple to implement — just sample and vote
No additional training or fine-tuning needed
Confidence correlates with vote margin

Cons

N× cost increase (typically 5-40 samples needed)
Only works for tasks with definitive answers
Higher latency if samples aren't parallelized
Diminishing returns beyond a certain sample count

Implementation Steps

1Identify tasks where the model gives inconsistent answers
2Generate N reasoning chains with temperature > 0 (typically N=5 to 40)
3Extract the final answer from each chain
4Apply majority voting (or weighted voting) to select the answer
5Use vote margin as a confidence signal
6Tune N based on your accuracy/cost tradeoff

Real-World Example

Arithmetic Word Problem

For a complex word problem, 5 reasoning chains are generated. Three arrive at '156', one at '142', one at '156.5'. Majority vote selects '156' with 60% confidence. The two incorrect paths made different mistakes, while the three correct paths converged.

PythonMultiple Reasoning Paths with Majority Vote

from openai import OpenAI
from collections import Counter

client = OpenAI()

def self_consistency(question: str, n_samples: int = 5) -> dict:
    answers = []

    for _ in range(n_samples):
        response = client.chat.completions.create(
            model="gpt-4o",
            temperature=0.7,
            messages=[{
                "role": "user",
                "content": f"{question}\n\nThink step by step. State your final answer after ANSWER:",
            }],
        )
        text = response.choices[0].message.content
        if "ANSWER:" in text:
            answers.append(text.split("ANSWER:")[-1].strip())

    votes = Counter(answers)
    winner, count = votes.most_common(1)[0]
    return {"answer": winner, "confidence": count / len(answers), "votes": dict(votes)}

References

Self-Consistency Improves Chain of Thought Reasoning — Wang et al., 2022