Prompt Chaining

Basic⛓️ Workflow PatternsAnthropic

Intent

Decompose a task into a fixed sequence of steps, where each LLM call processes the output of the previous one.

Problem

Complex tasks that ask an LLM to do too many things at once produce unreliable results. A single prompt that says "analyze this document, extract the key themes, translate them to Spanish, and format as bullet points" will often drop steps or lose quality. The more you cram into one call, the worse each individual piece becomes — the model's attention is spread too thin.

Solution

Break the task into a pipeline of smaller, focused prompts. Each step does one thing well and passes its output to the next. You can insert programmatic checks (gates) between steps to verify intermediate results before continuing. This trades latency for accuracy — each LLM call is simpler and more reliable. The key insight is that each step should be a task the LLM can do well in a single shot. If you find yourself needing chain-of-thought reasoning within a step, the step might be too complex and should be split further.

Diagram

Input → [LLM Call 1] → Gate ✓ → [LLM Call 2] → Gate ✓ → [LLM Call 3] → Output
                              ↓ ✗                        ↓ ✗
                            [Fail]                      [Fail]

When to Use

Tasks that can be cleanly decomposed into fixed, sequential subtasks
When each step's output needs validation before proceeding
When trading latency for accuracy is acceptable
Document processing pipelines: extract → analyze → summarize → format

When NOT to Use

Tasks where the number of steps can't be predicted in advance
When low latency is critical and the task is simple enough for one call
When steps have complex interdependencies (use Orchestrator-Workers instead)

Pros & Cons

Pros

Each step is simple and reliable
Easy to debug — you can inspect intermediate outputs
Gates catch errors early before they cascade
Easy to modify individual steps without affecting others

Cons

Higher latency due to sequential execution
Errors in early steps propagate to later ones
Rigid — number of steps is fixed at design time
Higher total token cost than a single call

Implementation Steps

1Identify the natural subtasks in your workflow
2Design a prompt for each subtask that does one thing well
3Define the output format of each step so it can be parsed and passed to the next
4Add validation gates between steps to catch errors early
5Implement error handling: what happens when a gate fails?
6Test each step independently, then test the full chain

Real-World Example

Marketing Copy Pipeline

Generate marketing copy, then translate it: Step 1 — LLM generates English marketing copy for a product. Gate checks for brand-voice compliance. Step 2 — A second LLM call translates the approved copy into Spanish. Gate checks for translation quality using back-translation.

PythonMarketing Copy Pipeline with Quality Gates

import anthropic

client = anthropic.Anthropic()

def run_pipeline(product: str) -> str:
    # Step 1: Generate marketing copy
    copy = client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=1024,
        messages=[{"role": "user", "content": f"Write 2-paragraph marketing copy for: {product}"}]
    ).content[0].text

    # Step 2: Quality gate — brand voice check
    check = client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=256,
        messages=[{"role": "user", "content": f"Does this match a professional, friendly tone? Reply PASS or FAIL with reason.\n\n{copy}"}]
    ).content[0].text

    if "FAIL" in check:
        raise ValueError(f"Brand voice check failed: {check}")

    # Step 3: Translate approved copy
    return client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=1024,
        messages=[{"role": "user", "content": f"Translate to Spanish, preserving tone:\n\n{copy}"}]
    ).content[0].text