Parallelization
Intent
Run multiple LLM calls simultaneously and aggregate their results — either by splitting a task into independent parts or by getting diverse perspectives on the same task.
Problem
Sequential processing is slow, and single-perspective outputs lack robustness. When you need to evaluate multiple aspects of a document, or want higher confidence in a judgment, running one call at a time wastes time and leaves quality on the table.
Solution
Execute multiple LLM calls in parallel, then programmatically combine results. This manifests in two variations: Sectioning splits a task into independent subtasks that run simultaneously. For example, guardrail checking runs in parallel with the main response generation. Voting runs the same task multiple times with different prompts or temperatures and selects the best answer (majority vote, average score, etc.).
Diagram
Sectioning: Voting:
┌→ [Subtask A] ─┐ ┌→ [Attempt 1] ─┐
Input → [Fan-out] [Aggregate] Input → [Fan-out] [Vote/Merge]
└→ [Subtask B] ─┘ └→ [Attempt 2] ─┘When to Use
- Independent subtasks that can safely run at the same time
- When you need multiple perspectives for higher confidence
- Guardrail checks that should run alongside the main task
- Evaluation tasks where each criterion can be assessed independently
When NOT to Use
- Tasks where subtasks depend on each other's outputs
- When cost is more important than speed or quality
- Simple tasks that don't benefit from multiple perspectives
Pros & Cons
Pros
- Significantly faster than sequential execution
- Voting reduces errors and hallucinations
- Guardrails don't add to main response latency
- Each parallel branch can use different prompts or models
Cons
- Higher total token cost (multiple calls)
- Aggregation logic can be complex
- Harder to debug than sequential flows
- Not all tasks are safely parallelizable
Implementation Steps
- 1Identify which parts of your task are independent
- 2Decide between sectioning (different subtasks) and voting (same task, multiple attempts)
- 3Implement parallel execution (Promise.all, asyncio.gather, etc.)
- 4Design the aggregation strategy: merge, vote, pick-best, average
- 5Handle partial failures — what if one branch fails?
- 6Set timeouts to prevent slow branches from blocking results
Real-World Example
Code Review with Guardrails
When a user submits code for review: Branch A checks for security vulnerabilities, Branch B reviews code quality, Branch C scans for PII/secrets. All three run simultaneously. Results are merged into a unified report. If any branch flags critical issues, the review is escalated.
import anthropic
import asyncio
client = anthropic.AsyncAnthropic()
async def check(code: str, aspect: str, instruction: str) -> dict:
response = await client.messages.create(
model="claude-sonnet-4-20250514", max_tokens=300,
messages=[{"role": "user", "content": f"{instruction}\n\nCode:\n{code}"}]
)
return {"aspect": aspect, "findings": response.content[0].text}
async def parallel_review(code: str) -> list[dict]:
# Fan-out: run all checks simultaneously
results = await asyncio.gather(
check(code, "security", "Find security vulnerabilities in this code."),
check(code, "quality", "Review code quality: naming, complexity, style."),
check(code, "pii", "Scan for hardcoded secrets, API keys, or PII."),
)
# Aggregate results
return list(results)