Reflexion

Advanced🧠 Reasoning PatternsShinn et al., 2023

Intent

The agent reflects on its past failures, stores verbal self-critiques in memory, and uses them to improve future attempts.

Problem

Agents make mistakes but don't learn from them within a session. If an agent fails at a coding task, it will often make the same type of error on the next attempt because it has no memory of what went wrong. Traditional retry logic just re-runs the same approach.

Solution

After a failed attempt, the agent generates a verbal reflection: a natural-language analysis of what went wrong and what to do differently. This reflection is stored in a short-term memory buffer. On the next attempt, the agent includes past reflections in its context, allowing it to avoid previous mistakes. The key innovation is using natural language as the memory format — it's richer than scalar rewards and directly actionable by the LLM.

Diagram

Attempt 1 → [Execute] → Fail
                              ↓
                        [Reflect: 'I failed because X. Next time I should Y.']
                              ↓
                        [Store in memory]
                              ↓
Attempt 2 → [Execute with reflection context] → Fail/Pass
                              ↓ (if fail)
                        [Reflect: 'Y didn't work either. Try Z.']
                              ↓
Attempt 3 → [Execute with all reflections] → Pass ✓

When to Use

Tasks where the agent can attempt, evaluate, and retry (code, math, reasoning)
When error signals are available (test results, compiler errors)
Long-horizon tasks where learning from early mistakes is critical
When you want the agent to improve within a single session

When NOT to Use

One-shot tasks where retrying isn't possible
When there's no reliable evaluation signal
Simple tasks where the first attempt usually succeeds

Pros & Cons

Pros

Learns from mistakes within a single session
Natural-language reflections are rich and actionable
Avoids repeating the same errors
Improves over time without model fine-tuning

Cons

Requires a reliable evaluation signal (tests, scoring, etc.)
Multiple retries increase latency and cost
Reflections can be wrong — the agent may misdiagnose failures
Reflection context grows with each attempt, consuming tokens

Implementation Steps

1Implement the action/evaluation loop (agent tries, environment evaluates)
2Build the reflection prompt: 'Analyze what went wrong and what to do differently'
3Store reflections in a memory buffer (simple list or sliding window)
4Include recent reflections in the context for each new attempt
5Set a maximum retry count to bound cost
6Monitor whether reflections actually improve subsequent attempts

Real-World Example

Code Generation with Test Feedback

Agent writes a function, runs unit tests, 3 of 5 fail. Reflection: 'I didn't handle the edge case where the list is empty, and I used 0-based indexing when the spec expected 1-based.' Next attempt includes this reflection and passes 5/5 tests.

PythonCode Generation with Reflection on Failures

from openai import OpenAI

client = OpenAI()

def solve_with_reflexion(problem: str, test_fn, max_attempts: int = 3) -> str:
    reflections = []

    for attempt in range(max_attempts):
        reflection_ctx = "\n".join(
            f"Attempt {i+1}: {r}" for i, r in enumerate(reflections)
        )
        prompt = f"{problem}\n\nPast reflections:\n{reflection_ctx}" if reflections else problem

        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Write Python code. Return only the function."},
                {"role": "user", "content": prompt},
            ],
        )
        code = response.choices[0].message.content

        passed, error = test_fn(code)
        if passed:
            return code

        # Reflect: analyze what went wrong
        reflection = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Code:\n{code}\n\nError:\n{error}\n\nWhat went wrong and how to fix it?"}],
        )
        reflections.append(reflection.choices[0].message.content)

    return code

References

Reflexion: Language Agents with Verbal Reinforcement Learning — Shinn et al., 2023