Token Gluttony

Basic🚫 Anti-Pattern🌀 Anti-Patterns: ContextIndustry practice

🚫Anti-Pattern— This describes a common mistake to avoid, not a pattern to follow.

The Anti-Pattern

Architectural choices that waste tokens systematically — verbose system prompts, full tool outputs dumped into context, unnecessary chain-of-thought on trivial tasks.

Why It Happens

Developers dump everything into context ‘just in case.’ Full API responses with 90% irrelevant fields, verbose system prompts repeated on every call, chain-of-thought reasoning forced on simple lookups. Token costs scale linearly, but value doesn’t — past a certain point, more tokens means more noise, not more signal. The worst part is that token gluttony often masquerades as thoroughness.

How to Fix It

Audit token usage per component and trim ruthlessly. Use prompt caching for static prefixes that don’t change between calls. Reserve chain-of-thought for tasks where reasoning genuinely improves accuracy. Apply structured output schemas to minimize response tokens. Trim tool outputs to only the fields the agent actually needs. The principle is simple: every token in context should earn its place. If you can’t explain why a token is there, it shouldn’t be.

Diagram

  Token Glutton:                    Optimized:
  ┌──────────────────────┐          ┌──────────────────────┐
  │████████████████████  │          │██░░░░░░░░░░░░░░░░░░░░│
  │████ SYSTEM PROMPT ██│          │SP│                    │
  │████████████████████  │          │░░│    Available for   │
  │████ FULL API RESP ██│          │░░│    actual work     │
  │████████████████████  │          │░░│                    │
  │█ CoT on trivial task│          │░░░░░░░░░░░░░░░░░░░░░░│
  │█████ task ██████████│          │░░░░░░░░ task █████████│
  └──────────────────────┘          └──────────────────────┘
   80% waste, 20% task              15% overhead, 85% task

Symptoms

Token costs are high relative to task complexity
System prompts are thousands of tokens with instructions the agent rarely uses
Full JSON API responses are dumped into context when only 2-3 fields matter
Chain-of-thought is forced on every task regardless of difficulty

False Positives

Complex tasks that genuinely require rich context to perform well
Research tasks where broad context demonstrably improves output quality
Early prototyping where optimization is premature

Warning Signs & Consequences

Warning Signs

Token costs growing faster than the value delivered by the agent
Latency disproportionate to the difficulty of the task
Budget alerts or unexpectedly high API bills
Context window filling up and truncating actually important information

Consequences

Unnecessary API cost that scales with every single request
Slower response times from processing irrelevant tokens
Reduced effective context window for the actual task at hand
Masking real performance issues behind a wall of unnecessary processing

Remediation Steps

1Audit token usage: measure tokens per component (system prompt, tools, history, task)
2Trim tool output schemas to include only fields the agent actually uses
3Implement prompt caching for static system prompt prefixes
4Use chain-of-thought selectively — only for tasks where it measurably helps
5Set token budgets per section and alert when thresholds are exceeded

Real-World Example

Expensive RAG Queries

A RAG application retrieves 10 full documents (50K tokens) to answer a simple factual question that only needed one paragraph. Each query costs $0.50 instead of $0.02. At 10K queries per day, the team is spending $5,000/day instead of $200/day — a 25x cost multiplier for identical answer quality. The fix was trimming retrieved chunks and limiting to the 2 most relevant passages.