ptrnsai

Token Gluttony

Basic🚫 Anti-Pattern🌀 Anti-Patterns: ContextIndustry practice
🚫Anti-Pattern— This describes a common mistake to avoid, not a pattern to follow.

The Anti-Pattern

Architectural choices that waste tokens systematically — verbose system prompts, full tool outputs dumped into context, unnecessary chain-of-thought on trivial tasks.

Why It Happens

Developers dump everything into context ‘just in case.’ Full API responses with 90% irrelevant fields, verbose system prompts repeated on every call, chain-of-thought reasoning forced on simple lookups. Token costs scale linearly, but value doesn’t — past a certain point, more tokens means more noise, not more signal. The worst part is that token gluttony often masquerades as thoroughness.

How to Fix It

Audit token usage per component and trim ruthlessly. Use prompt caching for static prefixes that don’t change between calls. Reserve chain-of-thought for tasks where reasoning genuinely improves accuracy. Apply structured output schemas to minimize response tokens. Trim tool outputs to only the fields the agent actually needs. The principle is simple: every token in context should earn its place. If you can’t explain why a token is there, it shouldn’t be.

Diagram

  Token Glutton:                    Optimized:
  ┌──────────────────────┐          ┌──────────────────────┐
  │████████████████████  │          │██░░░░░░░░░░░░░░░░░░░░│
  │████ SYSTEM PROMPT ██│          │SP│                    │
  │████████████████████  │          │░░│    Available for   │
  │████ FULL API RESP ██│          │░░│    actual work     │
  │████████████████████  │          │░░│                    │
  │█ CoT on trivial task│          │░░░░░░░░░░░░░░░░░░░░░░│
  │█████ task ██████████│          │░░░░░░░░ task █████████│
  └──────────────────────┘          └──────────────────────┘
   80% waste, 20% task              15% overhead, 85% task

Symptoms

  • Token costs are high relative to task complexity
  • System prompts are thousands of tokens with instructions the agent rarely uses
  • Full JSON API responses are dumped into context when only 2-3 fields matter
  • Chain-of-thought is forced on every task regardless of difficulty

False Positives

  • Complex tasks that genuinely require rich context to perform well
  • Research tasks where broad context demonstrably improves output quality
  • Early prototyping where optimization is premature

Warning Signs & Consequences

Warning Signs

  • Token costs growing faster than the value delivered by the agent
  • Latency disproportionate to the difficulty of the task
  • Budget alerts or unexpectedly high API bills
  • Context window filling up and truncating actually important information

Consequences

  • Unnecessary API cost that scales with every single request
  • Slower response times from processing irrelevant tokens
  • Reduced effective context window for the actual task at hand
  • Masking real performance issues behind a wall of unnecessary processing

Remediation Steps

  1. 1Audit token usage: measure tokens per component (system prompt, tools, history, task)
  2. 2Trim tool output schemas to include only fields the agent actually uses
  3. 3Implement prompt caching for static system prompt prefixes
  4. 4Use chain-of-thought selectively — only for tasks where it measurably helps
  5. 5Set token budgets per section and alert when thresholds are exceeded

Real-World Example

Expensive RAG Queries

A RAG application retrieves 10 full documents (50K tokens) to answer a simple factual question that only needed one paragraph. Each query costs $0.50 instead of $0.02. At 10K queries per day, the team is spending $5,000/day instead of $200/day — a 25x cost multiplier for identical answer quality. The fix was trimming retrieved chunks and limiting to the 2 most relevant passages.

References