ptrnsai

Prompt Chaining

Basic⛓️ Workflow PatternsAnthropic

Intent

Decompose a task into a fixed sequence of steps, where each LLM call processes the output of the previous one.

Problem

Complex tasks that ask an LLM to do too many things at once produce unreliable results. A single prompt that says "analyze this document, extract the key themes, translate them to Spanish, and format as bullet points" will often drop steps or lose quality. The more you cram into one call, the worse each individual piece becomes — the model's attention is spread too thin.

Solution

Break the task into a pipeline of smaller, focused prompts. Each step does one thing well and passes its output to the next. You can insert programmatic checks (gates) between steps to verify intermediate results before continuing. This trades latency for accuracy — each LLM call is simpler and more reliable. The key insight is that each step should be a task the LLM can do well in a single shot. If you find yourself needing chain-of-thought reasoning within a step, the step might be too complex and should be split further.

Diagram

Input → [LLM Call 1] → Gate ✓ → [LLM Call 2] → Gate ✓ → [LLM Call 3] → Output
                              ↓ ✗                        ↓ ✗
                            [Fail]                      [Fail]

When to Use

  • Tasks that can be cleanly decomposed into fixed, sequential subtasks
  • When each step's output needs validation before proceeding
  • When trading latency for accuracy is acceptable
  • Document processing pipelines: extract → analyze → summarize → format

When NOT to Use

  • Tasks where the number of steps can't be predicted in advance
  • When low latency is critical and the task is simple enough for one call
  • When steps have complex interdependencies (use Orchestrator-Workers instead)

Pros & Cons

Pros

  • Each step is simple and reliable
  • Easy to debug — you can inspect intermediate outputs
  • Gates catch errors early before they cascade
  • Easy to modify individual steps without affecting others

Cons

  • Higher latency due to sequential execution
  • Errors in early steps propagate to later ones
  • Rigid — number of steps is fixed at design time
  • Higher total token cost than a single call

Implementation Steps

  1. 1Identify the natural subtasks in your workflow
  2. 2Design a prompt for each subtask that does one thing well
  3. 3Define the output format of each step so it can be parsed and passed to the next
  4. 4Add validation gates between steps to catch errors early
  5. 5Implement error handling: what happens when a gate fails?
  6. 6Test each step independently, then test the full chain

Real-World Example

Marketing Copy Pipeline

Generate marketing copy, then translate it: Step 1 — LLM generates English marketing copy for a product. Gate checks for brand-voice compliance. Step 2 — A second LLM call translates the approved copy into Spanish. Gate checks for translation quality using back-translation.

PythonMarketing Copy Pipeline with Quality Gates
import anthropic

client = anthropic.Anthropic()

def run_pipeline(product: str) -> str:
    # Step 1: Generate marketing copy
    copy = client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=1024,
        messages=[{"role": "user", "content": f"Write 2-paragraph marketing copy for: {product}"}]
    ).content[0].text

    # Step 2: Quality gate — brand voice check
    check = client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=256,
        messages=[{"role": "user", "content": f"Does this match a professional, friendly tone? Reply PASS or FAIL with reason.\n\n{copy}"}]
    ).content[0].text

    if "FAIL" in check:
        raise ValueError(f"Brand voice check failed: {check}")

    # Step 3: Translate approved copy
    return client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=1024,
        messages=[{"role": "user", "content": f"Translate to Spanish, preserving tone:\n\n{copy}"}]
    ).content[0].text

References