ptrnsai

Self-Consistency

Intermediate🧠 Reasoning PatternsGoogle Research (Wang et al., 2022)

Intent

Generate multiple independent reasoning paths for the same problem and select the most consistent answer through majority voting.

Problem

A single chain-of-thought can lead to wrong answers because the model happened to take a bad reasoning path. Different reasoning paths might lead to different answers, and you have no way to know which one is correct from a single sample.

Solution

Sample multiple reasoning chains independently (using temperature > 0) and take a majority vote on the final answers. The intuition is that correct reasoning paths tend to converge on the same answer, while incorrect paths tend to be diverse. The most common answer across many samples is most likely correct. This is essentially ensemble methods applied to LLM reasoning.

Diagram

                ┌→ [Reasoning Path 1] → Answer: 42
                │
Question → [Sample N paths] → [Reasoning Path 2] → Answer: 42
                │
                ├→ [Reasoning Path 3] → Answer: 37
                │
                └→ [Reasoning Path 4] → Answer: 42

                     Majority Vote → 42 ✓

When to Use

  • Mathematical reasoning where there's a single correct answer
  • When you need high confidence in the result
  • Tasks where multiple reasoning approaches exist
  • Critical decisions where errors are costly

When NOT to Use

  • Open-ended creative tasks with no single correct answer
  • When cost per query is a constraint (requires N× calls)
  • Simple tasks where the model rarely makes errors

Pros & Cons

Pros

  • Significantly higher accuracy than single-sample CoT
  • Simple to implement — just sample and vote
  • No additional training or fine-tuning needed
  • Confidence correlates with vote margin

Cons

  • N× cost increase (typically 5-40 samples needed)
  • Only works for tasks with definitive answers
  • Higher latency if samples aren't parallelized
  • Diminishing returns beyond a certain sample count

Implementation Steps

  1. 1Identify tasks where the model gives inconsistent answers
  2. 2Generate N reasoning chains with temperature > 0 (typically N=5 to 40)
  3. 3Extract the final answer from each chain
  4. 4Apply majority voting (or weighted voting) to select the answer
  5. 5Use vote margin as a confidence signal
  6. 6Tune N based on your accuracy/cost tradeoff

Real-World Example

Arithmetic Word Problem

For a complex word problem, 5 reasoning chains are generated. Three arrive at '156', one at '142', one at '156.5'. Majority vote selects '156' with 60% confidence. The two incorrect paths made different mistakes, while the three correct paths converged.

PythonMultiple Reasoning Paths with Majority Vote
from openai import OpenAI
from collections import Counter

client = OpenAI()

def self_consistency(question: str, n_samples: int = 5) -> dict:
    answers = []

    for _ in range(n_samples):
        response = client.chat.completions.create(
            model="gpt-4o",
            temperature=0.7,
            messages=[{
                "role": "user",
                "content": f"{question}\n\nThink step by step. State your final answer after ANSWER:",
            }],
        )
        text = response.choices[0].message.content
        if "ANSWER:" in text:
            answers.append(text.split("ANSWER:")[-1].strip())

    votes = Counter(answers)
    winner, count = votes.most_common(1)[0]
    return {"answer": winner, "confidence": count / len(answers), "votes": dict(votes)}

References