Budgeting Your Claude Code Tokens

42 min read
claude-codeaideveloper-toolscost-optimization

I used to watch the wrong number.

When I first started paying for Claude Code, I’d keep one eye on the output token counter — watching it tick up as code got written, assuming that’s where the money was going. Turns out, I was fixating on the receipt for a small coffee while my rent went unpaid.

The real cost isn’t what Claude generates. It’s what you put into context before asking it to generate anything.

Every Claude Code session is a budget. You start with $0 and every action spends from it: how you search for code, how long you keep your session open, whether your cache is warm or cold, and what you accidentally drag into context and never let go. Some of those spending decisions are wildly efficient. Others are quiet budget leaks that compound across every turn you take.

This post is a spending guide. It draws on analytical models and empirical data from 149 real sessions to answer one question: where does the money actually go — and how do you stop wasting it?

TL;DR: Discovery — not code generation — accounts for 30–40% of session cost in brownfield codebases. Switching from broad ripgrep queries to ast-grep or ck-search reduces discovery tokens by 70–88%. Prompt caching cuts input costs by 90%. But cold starts, context noise, and poor session architecture erode those savings faster than most developers realize. Tools like GSD and Spec-Kit exist specifically to solve the architectural problem — not as productivity wrappers, but as budget management systems.

Written for developers using Claude Code or GitHub Copilot on production codebases — especially anyone evaluating whether workflow or search tooling is worth the setup cost.


Your Token Budget Has Four Line Items

Before any tooling talk, it helps to have a mental model. Your per-session token budget breaks down into four categories:

#Line itemWhat drives itThe lever
1DiscoveryHow you find code before changing itTool choice — rg vs ast-grep vs ck-search
2MCP tool overheadFixed tax per turn for registered serversHygiene — only enable what you’ll use
3CachingHow much of your context is served from cacheSession discipline — stay in sessions, protect the prefix
4OutputWhat Claude generatesPrecision — correct on first try

That’s the whole game. Everything else is commentary on these four numbers.

Discovery is your biggest variable expense — and most developers never look at it. MCP overhead is a fixed per-turn tax that compounds silently. Caching is your biggest savings lever — and most developers accidentally break it. Output looks tiny in token volume but hits hardest in dollars per token.

Let’s go through each one.


Line Item 1: Discovery

Discovery is how Claude Code finds the code it needs before writing anything. It’s also the most surprising line item for most developers — because unlike output tokens, discovery tokens are invisible in the UI.

The primitives break down like this.

Read

Read injects the entire file into the conversation as a tool result. A 500-line TypeScript file costs roughly 3,000–5,000 tokens. Those tokens stay in conversation history for every subsequent turn — not just the current one.

Read("src/auth/service.ts")  →  entire file injected into context
→ 500-line file ≈ 3,000–5,000 tokens
→ persists in ALL subsequent turns

Read is the most expensive primitive per call, and its cost compounds. Read ten files across the first five turns, and those tokens are being re-sent to the API at turn 25. The model isn’t “forgetting” earlier content — quite the opposite. The conversation history becomes a graveyard of file dumps that keeps growing the per-turn context.

Read is also the right tool when you know exactly which file you need. The issue isn’t Read — it’s reaching for Read before you know which file matters.

ripgrep (rg)

Grep runs ripgrep under the hood. Token cost depends on how many lines match and how much surrounding context you request.

rg "authenticate" src/           # every matching line across every file
rg -C 3 "authenticate" src/      # matching lines + 3 lines before/after each

ripgrep has no concept of relevance. Every match is equal. A broad query on a brownfield codebase is a token bomb: rg "error" . on a 50k LOC project returns 800+ lines — 4,000 to 8,000 tokens for a single tool call, all of it persisted in history.

The secondary cost is output tokens. The model receives all those matches and has to mentally filter them, burning output tokens to reason about noisy input before it can form a plan.

Glob

Glob is cheap — it returns only file paths, not contents.

glob("**/*.ts")  →  list of file paths
→ 200 files ≈ 400–800 tokens (just names)

Best used to discover what exists before committing to Read or rg. The “filenames first” pattern (rg "processPayment" src/ -l) is the same idea applied to content search: confirm where something lives before you load the whole file.

Bash

Arbitrary shell commands. Token cost equals the length of stdout/stderr. git log --oneline -20 is cheap. cat large_file.json is a Read in disguise.


Line Item 2: The MCP Registration Tax

Here’s what nobody tells you about MCP servers: you pay for them on every single turn, whether you use them or not.

Every API request includes the complete tool definitions for all registered MCP servers — the tool names, descriptions, parameters, and schema. This overhead is unavoidable and applies even before a single tool is called. The native Claude Code tools cost ~800 tokens per turn. Add ck-search and Serena and that jumps to ~3,700 tokens — on every turn, for the life of the session.

The Per-Turn MCP Tax

MCPs activeTool def tokens/turnAmortized cost/turn (cached, after turn 1)
None (native only)~800$0.00024
+ ck-search (6 tools)~1,500$0.00045
+ Serena (~15 tools)~3,000$0.00090
+ Both~3,700$0.00111

Pricing: Sonnet 4.6 at $3.00/MTok input, $0.30/MTok cache read.

The amortized column is what you pay after the cache warms up — turn 2 onward. These numbers seem small in isolation, but the effect compounds.

A 30-turn session with both MCPs active but never called:

Cache write (turn 1): 3,700 × $3.75/MTok = $0.0139
Cache reads (turns 2–30): 29 × 3,700 × $0.30/MTok = $0.0322
Total wasted: $0.046 on tool definitions for MCPs you never used

Run five such sessions per day for a week:

$0.046 × 5 sessions × 5 days = $1.15/week in pure overhead

The rule is simple: an unused MCP is a tax on every turn.

The other MCP cost is latency. A slow cold LSP initialization (Serena’s language server starting up) can add extra reasoning turns while the model waits, generating unnecessary output tokens. Register MCPs per-project rather than globally so they’re only active when actually useful.


Line Item 3: Caching — Your Best Investment

Claude’s prompt caching is the single most impactful optimization in the entire system, and it largely runs automatically. If discovery is where you can overspend, caching is where you win it back.

How the Cache Works

The cache is keyed on exact prefix hashes. After turn 1, all prior context — system prompt, tool definitions, conversation history — is eligible to be served from cache:

sequenceDiagram
    participant Ctx as Context
    participant Cache
    participant LLM

    Note over Ctx,LLM: Turn 1 — Cold start (cache WRITE)
    Ctx->>LLM: system + tools + user(1)
    LLM-->>Cache: WRITE prefix hash

    Note over Ctx,LLM: Turn 2 — 90% cache hit
    Ctx->>Cache: system + tools + user(1)
    Cache-->>Ctx: HIT — 90% savings on prefix
    Ctx->>LLM: asst(1) + user(2) [new only]
    LLM-->>Cache: WRITE new suffix

    Note over Ctx,LLM: Turn 3 — Growing hit
    Ctx->>Cache: system + tools + user(1) + asst(1)
    Cache-->>Ctx: HIT
    Ctx->>LLM: user(2) + asst(2) + user(3) [new only]
    LLM-->>Cache: WRITE new suffix

Each turn, the cache hit grows. By turn 5, roughly 80% of context is being served from cache. By turn 10, that figure is closer to 87–90%.

The Pricing Math

Token typePrice (Sonnet 4.6)vs. regular input
Regular input$3.00/MTok
Cache write (5-min TTL)$3.75/MTok1.25×
Cache write (1-hr TTL)$6.00/MTok
Cache read$0.30/MTok0.1×
Output$15.00/MTok5× input

A 40,000-token context hitting cache costs $0.012 per turn instead of $0.12. Over a 30-turn session, that’s the difference between $3.60 and $0.36 on input costs alone.

Real session data confirms this: across 149 actual Claude Code sessions, 94.6% of all tokens were served from cache, saving $2,028 in input costs — sessions that would have cost roughly $2,475 without caching cost $247 with it.

The Cold Start Penalty

The cold start penalty is real and measurable in the empirical data:

Turn count bucketSessionsCache hit %Avg cost/turn
1–10 turns2788.9%2.19¢
11–30 turns4499.9%1.40¢
31–100 turns4899.9%1.51¢
101–300 turns23100.0%2.66¢
300+ turns4100.0%2.76¢

Short sessions are the most expensive per turn. The 1–10 turn bucket has the fewest tokens per turn but the highest cost per turn — because it pays near-full price on almost everything. Once the cache warms after turn 3–5, the effective per-turn cost drops by 36%.

The 100+ turn buckets are expensive for a different reason: context has grown very large (89k–100k tokens/turn). Their 100% cache hit rates show caching is working perfectly — they’re just caching more content.

Two single-turn sessions from the same day on the same project illustrate the cold start penalty most starkly:

SessionTurnsTokensCache hitCost
A (cache cold)125,7000.0%$0.0566
B (cache warm)116,200100%$0.0062 — 9× cheaper

Same question. Same model. The only difference is whether the context was already cached.

What Breaks the Cache

ChangeCache effect
New MCP registered mid-sessionInvalidates from tool definitions forward
CLAUDE.md edited mid-sessionInvalidates from system prompt
New turn addedCache hit on all prior turns; only new content is fresh
tasks.md checkbox updated (Spec-Kit)Partial invalidation on tasks prefix
Session gap > 5 minutes (default TTL)Full cold start on next turn

The session gap issue is subtle. Walk away for ten minutes mid-session and the next turn re-pays full input price on the entire cached history. For a 30k-token context, that’s $0.090 on one turn vs $0.009 with a warm cache.


Line Item 4: Output

Output tokens make up only 0.29–0.59% of total token volume in a typical session — but they account for a disproportionate share of the bill.

At $15/MTok, output costs 50× more per token than a cache read. A turn generating 3,000 tokens of code costs $0.045 in output alone — roughly equal to the cache-read cost of a 150,000-token context. Once your cache is warm and large, output generation becomes the dominant variable cost. It’s the line item that spikes when something goes wrong.

The only lever here is precision: ask clearly, provide the right context, and get it right the first time. The best search tools (coming up) help here indirectly — a model with a clean, precise context window generates better output on the first try and needs fewer correction turns.


How Codebase Size Multiplies Every Line Item

All four line items above are real in any session, but their proportions shift dramatically based on how large and how old your codebase is. For greenfield projects, most of this complexity evaporates.

Greenfield: Low Stakes, Flat Curve

On a greenfield project, the budget is almost boring. Context sources are limited to spec artifacts and files you just created. rg results are sparse. The token trajectory is flat and predictable. Cache hit rates are high because the system prompt (your CLAUDE.md or constitution) stays stable.

On a project with 20 files, rg "processPayment" src/ returns a few clean results. The model gets a clean signal. Context stays under 20k tokens for most sessions. rg is fine.

The only real greenfield risk is over-engineering your governance docs. A verbose CLAUDE.md adds 3,000+ tokens to every turn for no practical benefit. Keep it under 800 tokens.

Brownfield: Where All Four Line Items Get Expensive

In a brownfield codebase, every line item amplifies. Discovery is heavy because existing code matches everything. Reads are frequent because you need to understand patterns before you can change them. Token trajectory is steep early. And because discovery noise accumulates in context, it keeps raising your caching costs for the rest of the session — even after it’s stopped being relevant.

On a codebase with 40k lines of code, rg "auth" src/ -C 3 might return 8,000 tokens in a single call. That result sits in context for the next 40 turns, costing $0.30/MTok on every cache read even after it’s stopped being useful.

The Discovery Cost Problem

The comparison for understanding an existing auth system makes the scale of the problem concrete:

Discovery approachToken cost
rg "auth" . -C 58,000–20,000 tokens
rg with 3–4 specific patterns3,000–8,000 tokens
ck --sem "authentication" --threshold 0.7 --limit 10400–800 tokens
Serena: find_symbol + get_references200–400 tokens
ck + Serena combined600–1,200 tokens

The 20× token difference between naive rg and a ck/Serena combination isn’t just a turn-1 cost. Those extra 19,000 tokens stay in context for the rest of the session, costing $0.30/MTok on every subsequent turn’s cache read. In a 30-turn session, 19,000 excess context tokens add:

19,000 × $0.30/MTok × 30 turns = $0.171 in extra cache reads alone

That’s before accounting for the quality degradation from the model reasoning through noisy context.

The Brownfield Trap

The main brownfield anti-pattern is this sequence:

  1. Ask Claude to add a feature to an existing codebase
  2. Claude runs a few broad rg queries to understand the codebase
  3. Those queries return thousands of tokens of context — most of it not directly relevant
  4. The model plans and implements while carrying that noisy context
  5. By turn 15, the model is hallucinating or drifting because the noise-to-signal ratio is too high
  6. You spend 3–5 correction turns fixing the drift, each carrying that same noisy context

The cost of discovery noise isn’t just the tokens it injects — it’s every correction turn it causes.


Managing the Budget at Scale: Session Architecture

Here’s the thing about the four line items above: you can optimize each one individually, and you’ll still overspend if your session is badly architected.

Discovery noise from turn 2 is still in context at turn 50. Cache TTLs expire mid-work and your next turn re-pays full price on everything. Output quality degrades as the model reasons through accumulated noise. Each of these compounds the others. What starts as a $0.08 discovery cost at turn 2 becomes a $0.30+ drag by turn 30 — not because you paid more, but because you’re paying for that noise on every cached turn that follows.

This is the problem that GSD and Spec-Kit were designed to solve — not as productivity wrappers, but as architectural approaches to context budget management. Understanding them through that lens makes their trade-offs obvious.

GSD: A Fresh Budget Per Phase

GSD (get-shit-done) is built around one core idea: by the time you’re implementing, you shouldn’t be paying for your exploration history.

The architecture separates work into phases, each running as a fresh subagent:

flowchart LR
    subgraph inputs["Persistent Inputs"]
        PM[PROJECT.md]
        RM[REQUIREMENTS.md]
        RO[ROADMAP.md]
    end

    subgraph discuss["Discuss Phase
(fresh subagent)"]
        D[/gsd-discuss-phase/]
    end

    subgraph plan["Plan Phase
(fresh subagent)"]
        P[/gsd-plan-phase/]
    end

    subgraph execute["Execute Phase
(parallel fresh subagents)"]
        E1[Task 1]
        E2[Task 2]
        EN[Task N]
    end

    PM & RM & RO --> D
    D -->|CONTEXT.md| P
    PM & RM --> P
    P -->|PLAN.md XML| E1 & E2 & EN
    PM --> E1 & E2 & EN

The execute subagent is the key. It starts with a clean context containing only what it needs for its specific task. It doesn’t carry the discussion history, the planning back-and-forth, or the codebase exploration noise from earlier phases.

This is a budget reset. Each phase pays its own cold start. Each phase’s discovery tokens never pollute the next phase’s context. The discuss phase can make broad rg queries without penalty, because that noise disappears before execute ever sees a token of it.

GSD’s XML plan format creates a stable prefix for Anthropic’s cache. The content before <tasks> rarely changes, so parallel subagents executing different tasks from the same plan share cache writes. High cache hit rate on re-runs of the same plan.

Built-in token optimizations:

  • Prompt thinning (v1.36.0+): Automatically reduces prompt size for sub-200k models when context pressure rises
  • Knowledge graph (/gsd-graphify): Structured relationships instead of prose RESEARCH.md — lower token cost, higher information density
  • Wave execution: Plans run in parallel subagents with fresh contexts — no cross-contamination
  • discuss_mode: assumptions: For brownfield, reads your code instead of asking questions — eliminates question-round tokens

For brownfield specifically: Run /gsd-map-codebase first. It spawns parallel analysis subagents to scan the codebase and write distilled findings to PROJECT.md. You pay the discovery cost once; every subsequent phase reads a summary instead of re-discovering the codebase.

Real native-subagent session data (79 sessions, same architectural pattern as GSD execute phases):

249,473 Avg cold start write cache_create tokens
2,796 Avg fresh input tokens per session
14,483 Avg output tokens per session
68.7 Avg turns per session
94.6% Cache read share of all tokens served
79 sessions, native subagent pattern

Each subagent writes to cache once, then reads cheaply for 68 turns. The cold start write cost amortizes to $0.014/turn across the subagent’s lifetime.

GSD’s trade-off: multiple cold starts. Each phase spawns a fresh subagent, paying a cache write penalty. For large complex features, this is clearly worth it. For a tiny 3-file change, the cold start overhead isn’t justified. Think of it as: each cold start costs roughly $0.035 — if the phase isolation saves you more than that in avoided noise and correction turns, GSD pays for itself.

Spec-Kit: High-Signal Context by Design

Spec-Kit takes the opposite approach: a single long session with structured spec files that keep the model anchored to project requirements throughout. Instead of resetting the budget between phases, it fills the budget with purposeful signal.

Every /speckit.* command loads a stack of context files:

1. constitution.md   ≈ 500–2,000 tokens  (governance principles)
2. stack.md          ≈ 300–800 tokens    (tech stack facts)
3. spec.md           ≈ 500–3,000 tokens  (feature specification)
4. plan.md           ≈ 1,000–5,000 tokens (implementation plan)
5. tasks.md          ≈ 500–3,000 tokens  (atomic task breakdown)
Cold start baseline: 3,000–14,000 tokens

All of this persists throughout the session. The spec infrastructure serves as a quality anchor — the model can’t drift from project requirements because those requirements are literally in every API call. This is especially valuable for team projects where constitutional constraints (code style, security requirements, architectural boundaries) need to be consistently enforced.

Think of it this way: instead of paying discovery costs every time the model needs to remember project conventions, you pay once at cold start and cache it for the whole session. The spec files aren’t overhead — they’re pre-paid, high-signal context that replaces expensive rediscovery.

The accumulation curve:

Turn 1:   system + spec files loaded        =  8,000 tokens
Turn 5:   + 4 rg/read results              = 14,000 tokens
Turn 10:  + more reads + partial impls     = 22,000 tokens
Turn 20:  + full code files + test runs    = 45,000 tokens
Turn 30:  approaching context pressure

At high turn counts, the model starts summarizing earlier context — potential drift from original spec requirements. This is the “context rot” GSD solved architecturally by isolating phases. Spec-Kit’s answer is different: keep the signal high enough that noise never dominates.

Built-in mitigations:

  • speckit.optimize.tokens: Audits constitution and governance files for token bloat
  • speckit.memorylint.run: Prevents AGENTS.md from duplicating constitution.md content
  • speckit.archive.run: After a feature merge, compresses spec artifacts into .specify/memory/
  • speckit.cleanup.run: Post-implementation review that trims spec file cruft

Cache behavior: constitution.md loaded at turn 1 caches well for the first ~20 turns. Past that, or after a 5-minute TTL expiry, an explicit cache breakpoint helps. tasks.md is the main problem: checkboxes flip as tasks complete, breaking the cache prefix.

Load order matters: constitution → stack → spec → plan → tasks. Most stable files first, so the stable prefix stays cached longest. Never load tasks.md early.

Spec-Kit’s trade-off: accumulation without discipline. The spec files grow as the project evolves. Without speckit.archive.run after each merge, the project memory directory fills with stale specs that bloat every future session. Constitution.md should be under 800 tokens — if it’s growing, run speckit.optimize.tokens. The budget discipline has to be active, not passive.

The Write Cost Comparison

GSD and Spec-Kit also differ in how they write — and those writes affect cache behavior differently.

GSD writes at phase boundaries. CONTEXT.md (~1k tokens) and PLAN.md (~2.5k tokens) are generated as output at the end of one subagent and consumed as fresh inputs at the start of the next. No cache invalidation occurs because each subagent starts fresh. The real write cost is in output tokens ($15/MTok) to generate these artifacts — about $0.052 total. Crucially, CONTEXT.md is a distillation: it condenses 5,000 tokens of discuss history into 1,000 tokens of structured signal, saving downstream phases from carrying that noise.

Spec-Kit writes mid-session and across sessions. Within a single speckit implement run, tasks.md edits are tool results appended to the conversation — they don’t invalidate the prefix cache. Clean. But between invocations, the updated tasks.md breaks the cache prefix from that point forward: about $0.006 re-cache cost per new command invocation. Small per-invocation, but spec files also grow. A project 6 months in might load 14,000 tokens of spec infrastructure per command vs 6,800 at the start — doubling cold-start cost without anyone noticing.

Workflow toolWrite artifactsGrowth profile
GSDEphemeral per-feature — CONTEXT.md + PLAN.mdStays small (~1–3k tokens/feature)
Spec-KitPersistent spec files in repoGrows with project history (6.8k early → 15k+ at scale)

The Net Budget Impact

Both tools add overhead and remove waste. Whether you come out ahead depends entirely on session length and task complexity.

The overhead is fixed and immediate — it’s the price of admission:

GSD additional cost vs bare Claude:
  3 cold starts + CONTEXT.md + PLAN.md output:   +$0.107

Spec-Kit additional cost vs bare Claude:
  Spec prefix (6.8k) + artifact generation:       +$0.047
  Per-turn MCP tax if ck+Serena active:           +$0.002/turn

The savings compound with complexity. A correction turn at turn 10 (25k context) costs $0.030. A rework sequence at turn 20 (40k context) costs $0.171. Scoped prompts from PLAN.md generate roughly 1,000 fewer tokens per implementation turn than unscoped prompts — $0.015/turn in output savings, or $0.15 over 10 turns.

Breakeven (corrections prevented to recover overhead):
  GSD:       ~2 correction turns, or ~0.6 rework sequences
  Spec-Kit:  ~1 correction turn,  or ~0.3 rework sequences

Here’s what that means by session length:

< 5 turns5–15 turns15–30 turns30+ turns
GSD+$0.10+ overhead±$0.00−$0.05–$0.20 savings−$0.20–$0.50 savings
Spec-Kit+$0.05+ overhead−$0.02–$0.08 savings−$0.05–$0.15 savings−$0.10–$0.30 savings

+ = costs more than no tooling; − = costs less

In practice, Spec-Kit’s overhead is so low that a single prevented clarification turn pays for the whole session. GSD requires more task complexity to justify — typically brownfield features that would otherwise drift past turn 15. Neither tool makes sense for tasks under 5 turns.


The Budget Simulation: Four Approaches Head-to-Head

The scenario: add OAuth2 login to a brownfield commerce app (~40k LOC) that already has a custom auth system. This is exactly the kind of task where spending decisions diverge — meaningful discovery is required before implementation, and the tool choice at turn 2 shapes the entire budget from there.

Scenario: Add OAuth2 login to a brownfield commerce app (~40k LOC) with an existing custom auth system. Budget horizon: 10 turns. Model: Claude Sonnet 4.6.

Methodology: Token counts were derived from Claude Code’s actual tool result payloads at realistic codebase sizes (40k LOC, mixed TypeScript/Node.js). Costs use Sonnet 4.6 pricing ($3.00/MTok input, $3.75/MTok cache write, $0.30/MTok cache read, $15.00/MTok output). Empirical data is from 149 real sessions in ~/.claude/session-stats/sessions.csv, analyzed with SQLite.

Four approaches are simulated turn-by-turn:

ApproachTooling
A — rg + ReadJust Claude. Broad rg queries + whole-file reads. No workflow structure.
A’ — ast-grepSame as A, but structural ast-grep queries replace broad rg calls. No MCPs.
B — GSD + ToolsGSD phases (discuss → plan → execute). ck-search + Serena in execute subagent.
C — Spec-Kit + ToolsSingle Spec-Kit session. Spec files loaded. ck-search + Serena throughout.

Approach A: rg + Read, No Tools

Discovery relies on broad ripgrep queries and whole-file reads. Everything accumulates in one growing context.

Turn  Action                                    New tok  Cached   Output   Turn $    Running $
────────────────────────────────────────────────────────────────────────────────────────────
   1  Cold start. System + CLAUDE.md (3k).        3,200       0      400   $0.018    $0.018
      Ask: "help me add OAuth to our app"
      [cache WRITE: 3,200 × $3.75/MTok]

   2  rg "auth" . -C 3  →  8,000 token result     8,200   3,600      500   $0.040    $0.058
      [rg returns every auth mention, all files]
      [cache WRITE: 8,200 new tokens]

   3  Read("src/auth/service.ts") → 4,500 tok      4,700  11,600      600   $0.032    $0.090
      [whole file in context]

   4  Read("src/auth/middleware.ts") → 3,200 tok    5,400  16,300      700   $0.034    $0.124
      rg "passport|jwt" → 2,000 more tokens
      [context is now full of auth noise]

   5  Read("src/auth/strategies/") → 2,800 tok      3,000  21,700      800   $0.031    $0.155
      [model now has ~24k tokens of raw auth content]

   6  Plan OAuth approach (model reasons              200  24,500    1,500   $0.030    $0.185
      through all the noise to form a plan)

   7  Scaffold OAuth config                           200  26,200    2,500   $0.047    $0.232

   8  Implement passport strategy                     200  28,900    3,000   $0.054    $0.286

   9  Wire up routes                                  200  32,100    2,500   $0.048    $0.334

  10  Write tests                                     200  34,800    2,000   $0.042    $0.376
────────────────────────────────────────────────────────────────────────────────────────────
TOTAL                                              25,500  34,800   14,500             $0.376

Context at turn 10: ~35,000 tokens — mostly rg noise from turns 2–5.

Discovery cost (turns 1–5): $0.137 — 36% of the total budget just to find things.

The model is reasoning through 8,000 tokens of rg noise every single turn from turn 2 onward. That noise never stops costing money, even once it’s stopped being useful.


Approach A’: ast-grep, No MCP

Same scenario, but structural ast-grep queries replace the broad rg calls. No MCP overhead, no semantic index — better tool choice alone.

Turn  Action                                         New tok  Cached   Output   Turn $    Running $
───────────────────────────────────────────────────────────────────────────────────────────────────
   1  Cold start. System + CLAUDE.md (3k).              3,200       0      400   $0.018    $0.018
      Ask: "help me add OAuth to our app"
      [cache WRITE: 3,200 × $3.75/MTok]

   2  ast-grep --pattern                                  420   3,600      500   $0.015    $0.033
        'class $NAME { $$$ }' src/auth/
      + ast-grep --pattern
        'function $NAME(req, res, next) { $$$ }' src/
      → ~420 tokens (class/function signatures only)
      [vs 8,000+ from rg "auth" . -C 3 in Approach A]

   3  Read("src/auth/service.ts") → 4,500 tok            4,700   4,020      600   $0.025    $0.058
      (ast-grep pointed us to exactly this file)

   4  ast-grep --pattern '$OBJ.authenticate($$$)' src/     300   8,720      700   $0.014    $0.072
      + ast-grep --pattern
        'passport.use(new $STRATEGY($$$))' src/
      → ~300 tokens (call-sites only, no noise)
      [vs rg "passport|jwt" → 2,000+ tokens in A]

   5  ast-grep --pattern 'app.use($MIDDLEWARE)' src/       250  12,020      800   $0.012    $0.084
      → ~250 tokens (middleware registrations only)
      [vs Read("src/auth/strategies/") → 2,800 tok in A]

   6  Plan OAuth approach (clean structural picture,       200  13,270    1,500   $0.021    $0.105
      no raw text noise in context)

   7  Scaffold OAuth config                                200  14,970    2,500   $0.035    $0.140

   8  Implement passport strategy                          200  17,670    3,000   $0.047    $0.187

   9  Wire up routes                                       200  20,870    2,500   $0.040    $0.227

  10  Write tests                                          200  23,570    2,000   $0.036    $0.263
───────────────────────────────────────────────────────────────────────────────────────────────────
TOTAL                                                    9,870  23,570   14,500             $0.263

Context at turn 10: ~24,000 tokens — 31% smaller than Approach A.

Discovery cost (turns 1–5): $0.039 — down from $0.137. Same information, 72% less cost.

MCP overhead added: $0.00.

ast-grep returned structure — class boundaries, call sites, middleware registrations — without injecting raw textual noise. The model saw exactly what mattered. The discovery tokens were signal, not noise.


Approach B: GSD + ck-search + Serena

GSD splits into fresh-context phases. The execute subagent starts clean with precision tools active, carrying none of the exploration history from earlier phases.

Turn  Phase / Action                             New tok  Cached   Output   Turn $    Running $
─────────────────────────────────────────────────────────────────────────────────────────────────
      ── DISCUSS SUBAGENT (fresh cold start) ──

   1  PROJECT.md + REQUIREMENTS.md + CLAUDE.md     5,000       0      600   $0.028    $0.028
      Cold start: "add OAuth, existing custom auth"
      [cache WRITE: 5,000 × $3.75/MTok]
      Output: scoped CONTEXT.md distillation

   2  Refine scope, confirm assumptions               200   5,600      400   $0.008    $0.036
      [cache READ: 5,600 × $0.30/MTok]

      ── PLAN SUBAGENT (fresh cold start) ──

   3  PROJECT.md + CONTEXT.md + REQUIREMENTS.md    6,500       0      800   $0.036    $0.072
      [FRESH context — discuss history gone]
      Output: XML PLAN.md (3 tasks: config, strategy, routes)

   4  Refine plan, confirm task breakdown             200   7,300      600   $0.012    $0.084

      ── EXECUTE SUBAGENT (fresh cold start, ck + Serena active) ──

   5  PROJECT.md + PLAN.md + ck tools + Serena      8,000       0      400   $0.038    $0.122
      Cold start. [FRESH context — 8k vs 24k in approach A]
      ck --sem "authentication patterns" → 350 tok
      Serena find_definition("AuthService") → 90 tok
      [440 tokens of discovery vs 8,000+ from rg in approach A]

   6  Serena find_references("AuthService")           200   8,840      600   $0.017    $0.139
      Read(auth/service.ts lines 80-160 only) → 1,200 tok
      [1,200 targeted tokens vs 4,500 for whole file]

   7  Plan OAuth approach (clean context,             200  10,640    1,500   $0.026    $0.165
      no noise — model has exactly what it needs)

   8  Implement OAuth config + passport strategy      200  12,340    3,000   $0.048    $0.213

   9  Wire up routes                                  200  15,540    2,500   $0.040    $0.253

  10  Write tests                                     200  18,240    2,000   $0.034    $0.287
─────────────────────────────────────────────────────────────────────────────────────────────────
TOTAL                                              21,150  18,240   12,900             $0.287

Context at turn 10 (execute subagent): ~20,000 tokens — 43% smaller than Approach A.

Discovery cost: $0.017. ck found the relevant patterns in 350 tokens. Serena located the exact symbol definition. A targeted Read of 80 lines replaced a full 4,500-token file read.

MCP overhead: ~$0.025 across the session (tool definitions, cached after turn 1 of each subagent).


Approach C: Spec-Kit + ck-search + Serena

Single session. Spec files loaded at cold start. ck + Serena keep discovery tight from the first turn, while the spec infrastructure keeps the model anchored throughout.

Turn  Action                                     New tok  Cached   Output   Turn $    Running $
──────────────────────────────────────────────────────────────────────────────────────────────
   1  Cold start. Spec files + system + tools:    10,700       0      400   $0.046    $0.046
      constitution(800) + stack(400) + spec(2k)
      + plan(1.5k) + tasks(800) + CLAUDE.md(3k)
      + ck tools(700) + Serena tools(1,500)
      [Largest cold start of the 4 approaches]

   2  ck --sem "authentication patterns"             350  11,100      500   $0.012    $0.058
      --threshold 0.7 --limit 8
      Returns: 8 × 150-char snippets = 350 tokens
      [vs 8,000 from rg in Approach A]

   3  Serena find_definition("AuthService")          200  11,950      400   $0.010    $0.068
      Serena find_references("AuthService") → 250 tok
      [~450 tokens vs reading 3 whole files]

   4  Read(auth/service.ts lines 80–160 only)      1,200  12,550      600   $0.017    $0.085
      [targeted section, not whole 4,500-tok file]

   5  Plan OAuth approach (spec context keeps        200  14,350    1,000   $0.020    $0.105
      model anchored to constitution + stack.md)

   6  Implement OAuth config                         200  15,550    2,000   $0.035    $0.140

   7  Implement passport strategy                    200  17,750    2,500   $0.043    $0.183

   8  Wire up routes                                 200  20,450    2,000   $0.037    $0.220

   9  Write tests                                    200  22,650    2,000   $0.037    $0.257

  10  Update tasks.md (checkbox flip — partial       800  24,850      500   $0.016    $0.273
      cache invalidation on tasks prefix)
──────────────────────────────────────────────────────────────────────────────────────────────
TOTAL                                              14,250  24,850   11,900             $0.273

Context at turn 10: ~26,000 tokens — but 7k of that is stable spec infrastructure, not noise. The spec files add overhead, but they’re cached and purposeful.

Discovery cost: $0.039. The large cold start is the spec file overhead, not discovery noise.


The Head-to-Head Summary

A: rg+ReadA’: ast-grepB: GSD+ToolsC: Spec-Kit+Tools
Total (10 turns)$0.376$0.263$0.287$0.273
Savings vs A−30%−24%−27%
Context at T10~35,000 tok~24,000 tok~20,000 tok~26,000 tok
Signal/noise ratioLowHighVery HighHigh
Discovery cost (T1–5)$0.137$0.039$0.017$0.039
MCP overhead$0.00$0.00~$0.025~$0.025
Phase cold starts1131
Projected cost at T20~$0.75+~$0.51~$0.55~$0.50
Projected cost at T50likely fails~$1.30~$1.30~$1.20
Implementation quality risk⚠️ drifts T15+✅ stable✅ isolated✅ spec-anchored

The Key Budget Insight

A’ beats B at 10 turns despite B using precision semantic tools. Why?

ck-search’s MCP definition overhead costs roughly $0.025 across a 10-turn session ($0.00111/turn × 10 turns ≈ $0.011 in amortized cache reads, plus the initial cache write). ast-grep has zero overhead. When the codebase isn’t large enough for ck’s semantic index to massively outperform structural pattern matching, ast-grep wins on raw cost.

At turn 20+, B and C pull ahead. ck’s semantic savings compound as more of the codebase is explored, and the MCP overhead gets further amortized. In a 30-turn session on a large brownfield codebase, ck-search saves roughly 25,000 token-equivalents net after all overhead.

Where Each Dollar Goes

Looking at the cost breakdown by category makes the pattern obvious:

xychart-beta
    title "Where Each Dollar Goes — 10-Turn Brownfield (Approach A/A'/B/C)"
    x-axis ["A: rg+Read", "A': ast-grep", "B: GSD+ck", "C: Speckit+ck"]
    y-axis "Cost ($)" 0 --> 0.40
    bar [0.137, 0.039, 0.055, 0.039]
    bar [0.030, 0.021, 0.084, 0.046]
    bar [0.191, 0.158, 0.148, 0.188]
    bar [0.080, 0.005, 0.005, 0.005]

Bar groups per approach: Discovery · Planning/Overhead · Implementation · Noise waste

ApproachDiscoveryOverheadImplNoiseTotal
A: rg + Read$0.137 (36%)$0.030$0.191~$0.080$0.376
A’: ast-grep$0.039 (15%)$0.021$0.158~$0.005$0.263
B: GSD + Tools$0.055 (19%)$0.084$0.148~$0.005$0.287
C: Spec-Kit + Tools$0.039 (14%)$0.046$0.188~$0.005$0.273

The ast-grep Discovery Cheat Sheet

For structural queries, the token savings are dramatic enough to warrant a concrete spending comparison. These are the actual query substitutions from the OAuth brownfield scenario:

Query goalrg commandrg tokensast-grep equivalentast-grep tokens
All auth-related coderg "auth" . -C 38,000+ast-grep 'class $N { $$$ }' src/auth/420
Find middleware functionsrg "function.*req.*res"2,000ast-grep 'function $N(req,res,next) { $$$ }'300
Find passport call-sitesrg "passport|jwt"2,000ast-grep '$O.authenticate($$$)'300
Find express route registrationsrg "app\.(get|post)"3,000ast-grep 'app.$M($PATH,$$$)'250
Find try/catch blocksrg -A5 "try {"5,000ast-grep 'try { $$$ } catch ($E) { $$$ }'400
Combined20,0001,670

88% fewer tokens for equivalent structural information. ast-grep isn’t a replacement for ck-search (no semantic understanding) or Serena (no symbol graph), but it’s the free upgrade you should always make before reaching for MCP tools.

Some useful structural patterns to keep handy:

# All async functions (finds both function declarations and arrow functions)
ast-grep --pattern 'async function $NAME($$$) { $$$ }' src/

# All React useEffect with dependency arrays
ast-grep --pattern 'useEffect(() => { $$$ }, [$$$])' src/

# All try-catch blocks (find error handling patterns)
ast-grep --pattern 'try { $$$ } catch ($E) { $$$ }' src/

# All class definitions in a directory
ast-grep --pattern 'class $NAME { $$$ }' src/auth/

# Express route handlers
ast-grep --pattern 'app.$METHOD($PATH, $$$)' src/routes/

What the Real Budget Data Shows

The theoretical models hold up against real data. 149 actual Claude Code sessions across three brownfield projects (commerce, leadingedje, jumpmind) validate the key claims.

Cache Is Delivering the Returns

751M Total cache-read tokens 94.6% of all tokens
42.4M Total cache-write tokens cold start writes
2.3M Total output tokens 0.29% of all tokens
97.9% Avg session cache hit
$2,028 Cache savings realized $2,253 → $225 actual
$247 vs $2,475 Total cost vs. uncached 10× savings
149 sessions across brownfield projects

Confirmed: the theoretical 90% cache savings holds in production. This is real money recovered from what would otherwise be a very different bill.

Cache Hit Rate Grows Over a Session

Session a3352099 (commerce feature, no extra MCPs) was captured at 11 incremental snapshots, showing the cache warming effect in live numbers:

TurnTotal tokensCache read %Cost/turn
473.9k56.1%$0.02025
19587.1k78.9%$0.01952
22701.2k82.2%$0.01796
24779.7k83.7%$0.01764
351,290.4k86.2%$0.01805
974,757.3k91.9%$0.01752
1085,552.6k92.8%$0.01741

Cost per turn decreases from $0.020 to $0.017 as the session runs, even as total tokens grow 75×. The cache hit rate climbs from 56% to 92.8%. Cache savings outpaced context growth.

The spikes in cost between snapshots correlate with code-generation output bursts, not context accumulation. This is the output-cost paradox made visible: it’s the 0.29% of tokens that are output, not the 94%+ that are cached, driving cost during active coding.

Serena Reduces Total Spend, Not Per-Turn Rate

Without SerenaWith Serena
Sessions13112
Avg tokens/turn40,07134,215 (−15%)
Avg cost/turn1.78¢1.81¢
Avg cache hit %99.17%99.98%
Avg session cost$1.79$1.02 (−43%)

Per-turn cost is nearly identical — Serena’s overhead gets baked into the cached prefix. Total session cost is 43% lower because Serena sessions end sooner. Symbol-precision navigation finds what’s needed in fewer turns.

This validates the key point about how to think about these tools: Serena doesn’t save budget per call. It saves budget by shortening the total session. The right mental model is “fewer turns needed,” not “cheaper turns.”

The Extreme Case

The single most expensive session (9893d490, commerce project):

1,596 Turns
44.6 hrs Duration across multiple days
158M Total tokens
100% Cache hit rate
$44.01 Actual cost
~$440+ Without cache (est.) 10× more expensive
Session 9893d490 — commerce project

A 44-hour, 1,596-turn session. 100% cache hit rate throughout. Cache made this session practical at all. Without it, the bill would have been ten times larger. This is what good cache discipline looks like at the extreme end of the budget.


Conversational Sessions: A Different Budget

When you’re using Claude for research, Q&A, or explanation — not coding — the budget flips in interesting ways.

Data from 26 short Q&A-style sessions (1–15 turns, non-subagent) in the real dataset:

Average turns per session:         4.8
Average total tokens:             92,830
Average output %:                  0.59%   ← tiny relative to total
Average fresh input %:             0.01%   ← essentially zero
Average cache hit rate:           88.5%    ← lower than coding (cold start penalty)
Average cost per session:         $0.078
Average cost per turn:            2.10¢

The standout number is output at 0.59%. In a typical Q&A session with 92k total tokens, only ~550 tokens are actual new output. The other 99.4% is cached infrastructure — system prompt, tool definitions, conversation history.

The Conversational Budget Is Inverted

In coding sessions, output dominates cost — generating 3,000 tokens of code at $15/MTok is expensive. In conversational sessions, cache writes dominate — the cold start infrastructure costs more than all the answers combined:

pie title Q&A Session Cost Structure (26 sessions, $3.47 total)
    "Cache writes — cold start overhead" : 78
    "Cache reads — conversation history" : 14
    "Output — actual answers" : 7
    "Fresh input — user messages" : 1

You are paying 11× more to set up the context than to get answers from it.

This is a direct consequence of the cold start penalty. In conversational sessions, the warm-up is most of the bill. Which means the budget advice is simple: don’t start new sessions for follow-up questions. Keep going in the same conversation.

Medium-length conversational sessions (16–50 turns) are the cheapest per turn at roughly 1.42¢ — the sweet spot where the cold start is fully amortized and context hasn’t grown large enough to dominate.

A 30-turn research conversation — exploring a topic, asking follow-ups, iterating on understanding — costs about $0.95 total. Under a dollar for a thorough deep-dive. Without caching, the same session would cost $6–9.


Your Token Budget Playbook

The decision isn’t “which tool is best.” It’s “which spending pattern fits my current situation.”

Discovery Spending: Choose the Right Tool for the Job

Use rg when: you know exactly what string or symbol you’re looking for.

rg "processPayment" src/ -l          # filenames only first
rg "processPayment" src/payments/    # then scope to that directory

The anti-pattern: rg "error" . — returns thousands of lines from a brownfield codebase. Always scope to a subdirectory. Always use -l first.

Use ast-grep when: you know the structure of what you’re looking for, not the exact name. Structural refactors, finding all implementations of a code pattern, understanding code shape.

Zero MCP overhead, no setup required. This is your free upgrade from rg before reaching for MCPs.

Use ck-search when: you know the concept but not the symbol name. “Find where error handling is centralized.” “Show me config loading patterns.” Works best in long brownfield sessions where the MCP overhead is amortized and the semantic precision pays off repeatedly.

Optimal agent settings: --threshold 0.7 --limit 10 --snippet-length 150

Use Serena when: you have a specific symbol and need its full impact. Find all callers of processPayment, map the AuthService inheritance tree, list every IRepository implementation. Best ROI chained after ck: ck surfaces the symbol, Serena maps its graph.

Session Architecture: Choose the Right Budget Model

Use GSD when: the feature is complex enough that context rot is a real risk. More than 15 turns, multiple interdependent files, parallel workstreams. Phase isolation prevents accumulated noise from degrading quality in late-session turns. Think of each phase as its own budget envelope.

Use Spec-Kit when: you’re doing spec-driven work on a long-running team project and need the model consistently anchored to project requirements. The spec files are pre-paid signal — they replace expensive rediscovery at every session start. Run speckit.archive.run after every merge to keep the budget lean.

Use nothing but native tools when: greenfield prototype, simple Q&A, fewer than 5 turns, or you already know exactly which files to change. No cold start overhead, no spec loading, just direct interaction.

The decision tree is roughly: Am I likely to drift past 15 turns in a brownfield codebase? → GSD. Am I doing ongoing feature work that needs consistent project context? → Spec-Kit. Everything else? → native tools.

MCP Budget: Enable/Disable by Session Type

Situationck-searchSerenaReason
New greenfield, < 20 filesOFFOFFNothing to search yet
Brownfield, exact file knownOFFOFFJust Read it
Brownfield, concept explorationONOFFck saves 5–10× on discovery
Brownfield, refactor/blast radiusONONSerena’s reference maps essential
Debugging with stack traceOFFOFFYou have the location already
Debugging with symptoms onlyONOFFck for concept-based search
Short session (< 5 turns)OFFOFFNeither breaks even without amortization
Active 20+ turn brownfield sessionONONFull amortization active

The cleanest implementation is a per-project .claude/mcp.json:

// .claude/mcp.json in a brownfield project
{
  "mcpServers": {
    "ck-search": { "command": "ck", "args": ["--serve"] },
    "serena": { "command": "serena", "args": ["--stdio"] }
  }
}
// .claude/mcp.json in a greenfield project — intentionally empty
{
  "mcpServers": {}
}

MCPs auto-enable for the right projects and stay off everywhere else. No manual toggling required.


The Three Things That Move Your Budget Most

1. Discovery hygiene.

How you find code before changing it determines 30–40% of your session budget in brownfield scenarios. Broad rg queries inject thousands of tokens of noise that persist for the entire session. ast-grep, ck-search, and Serena exist specifically to replace that noise with signal. The 88% token reduction from ast-grep vs rg for structural queries — from 20,000 tokens to 1,670 — isn’t a rounding error. It’s real money at scale, and it compounds across every turn that carries that context.

2. Caching discipline.

The 90% input cost reduction from prompt caching is the biggest single lever in the system. Stay in sessions rather than restarting. Don’t edit CLAUDE.md mid-session. Don’t add MCPs mid-session. Don’t walk away for more than five minutes. Short, focused sessions are dramatically cheaper than interrupted ones. Real-world data confirms: 94.6% of all tokens were served from cache across 149 sessions, saving over $2,000 compared to what full-price input would have cost.

3. Session architecture.

Individual tool choices optimize turns. Session architecture optimizes the whole budget. GSD’s phase isolation means discovery noise from the discuss phase is gone before execution starts. Spec-Kit’s structured context means the model carries high-signal spec files instead of re-discovering project conventions from scratch. Both solve the same root problem — context accumulation and quality drift — from different architectural angles. Neither makes sense for short simple tasks. Both are clearly worth it once complexity crosses their breakeven threshold.

The real cost of Claude Code is not in the tokens it generates. It’s in what you put into context before asking it to generate anything.


Where to Start Spending Smarter

If you take nothing else from this: stop reaching for rg before you know what you’re looking for.

The single highest-ROI change on a brownfield codebase is replacing broad ripgrep queries with ast-grep for structural searches. No MCP setup. No configuration. Just install it and replace rg "function authenticate" with ast-grep -p 'function authenticate($$$)'. The context noise reduction is immediate — and free.

From there, think about your budget in layers:

  • Discovery layer: Add ck-search if you’re doing semantic discovery across a large codebase. Add Serena if you need precise symbol navigation and call-graph traversal.
  • Caching layer: Keep CLAUDE.md stable mid-session, don’t add MCPs ad hoc, stay in sessions instead of restarting.
  • Architecture layer: Add GSD or Spec-Kit if you’re managing complex multi-phase tasks where context drift is a real risk. Think of them as budget envelopes, not just workflow tools.

The tools compound. Discovery hygiene at turn 1 reduces noise for every turn that follows. Clean architecture means phase budgets don’t bleed into each other. And once your cache is warm, the whole system gets cheaper by the turn.

That’s the budget. Spend it well.


Further Reading


Tools referenced: ck-search, Serena, GSD, Spec-Kit, ast-grep.

Pricing: Claude Sonnet 4.6 at $3.00/MTok input, $3.75/MTok cache write, $0.30/MTok cache read, $15.00/MTok output.

Empirical data from 149 sessions across three brownfield projects, 2026-03-27 through 2026-04-14.