Parallelization

Intermediate⛓️ Workflow PatternsAnthropic

Intent

Run multiple LLM calls simultaneously and aggregate their results — either by splitting a task into independent parts or by getting diverse perspectives on the same task.

Problem

Sequential processing is slow, and single-perspective outputs lack robustness. When you need to evaluate multiple aspects of a document, or want higher confidence in a judgment, running one call at a time wastes time and leaves quality on the table.

Solution

Execute multiple LLM calls in parallel, then programmatically combine results. This manifests in two variations: Sectioning splits a task into independent subtasks that run simultaneously. For example, guardrail checking runs in parallel with the main response generation. Voting runs the same task multiple times with different prompts or temperatures and selects the best answer (majority vote, average score, etc.).

Diagram

Sectioning:                    Voting:
            ┌→ [Subtask A] ─┐              ┌→ [Attempt 1] ─┐
Input → [Fan-out]             [Aggregate]   Input → [Fan-out]            [Vote/Merge]
            └→ [Subtask B] ─┘              └→ [Attempt 2] ─┘

When to Use

Independent subtasks that can safely run at the same time
When you need multiple perspectives for higher confidence
Guardrail checks that should run alongside the main task
Evaluation tasks where each criterion can be assessed independently

When NOT to Use

Tasks where subtasks depend on each other's outputs
When cost is more important than speed or quality
Simple tasks that don't benefit from multiple perspectives

Pros & Cons

Pros

Significantly faster than sequential execution
Voting reduces errors and hallucinations
Guardrails don't add to main response latency
Each parallel branch can use different prompts or models

Cons

Higher total token cost (multiple calls)
Aggregation logic can be complex
Harder to debug than sequential flows
Not all tasks are safely parallelizable

Implementation Steps

1Identify which parts of your task are independent
2Decide between sectioning (different subtasks) and voting (same task, multiple attempts)
3Implement parallel execution (Promise.all, asyncio.gather, etc.)
4Design the aggregation strategy: merge, vote, pick-best, average
5Handle partial failures — what if one branch fails?
6Set timeouts to prevent slow branches from blocking results

Real-World Example

Code Review with Guardrails

When a user submits code for review: Branch A checks for security vulnerabilities, Branch B reviews code quality, Branch C scans for PII/secrets. All three run simultaneously. Results are merged into a unified report. If any branch flags critical issues, the review is escalated.

PythonParallel Code Review Checks

import anthropic
import asyncio

client = anthropic.AsyncAnthropic()

async def check(code: str, aspect: str, instruction: str) -> dict:
    response = await client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=300,
        messages=[{"role": "user", "content": f"{instruction}\n\nCode:\n{code}"}]
    )
    return {"aspect": aspect, "findings": response.content[0].text}

async def parallel_review(code: str) -> list[dict]:
    # Fan-out: run all checks simultaneously
    results = await asyncio.gather(
        check(code, "security", "Find security vulnerabilities in this code."),
        check(code, "quality", "Review code quality: naming, complexity, style."),
        check(code, "pii", "Scan for hardcoded secrets, API keys, or PII."),
    )
    # Aggregate results
    return list(results)

References

Building Effective Agents — Anthropic