Budgeting Your Claude Code Tokens
I used to watch the wrong number.
When I first started paying for Claude Code, I’d keep one eye on the output token counter — watching it tick up as code got written, assuming that’s where the money was going. Turns out, I was fixating on the receipt for a small coffee while my rent went unpaid.
The real cost isn’t what Claude generates. It’s what you put into context before asking it to generate anything.
Every Claude Code session is a budget. You start with $0 and every action spends from it: how you search for code, how long you keep your session open, whether your cache is warm or cold, and what you accidentally drag into context and never let go. Some of those spending decisions are wildly efficient. Others are quiet budget leaks that compound across every turn you take.
This post is a spending guide. It draws on analytical models and empirical data from 149 real sessions to answer one question: where does the money actually go — and how do you stop wasting it?
TL;DR: Discovery — not code generation — accounts for 30–40% of session cost in brownfield codebases. Switching from broad ripgrep queries to ast-grep or ck-search reduces discovery tokens by 70–88%. Prompt caching cuts input costs by 90%. But cold starts, context noise, and poor session architecture erode those savings faster than most developers realize. Tools like GSD and Spec-Kit exist specifically to solve the architectural problem — not as productivity wrappers, but as budget management systems.
Written for developers using Claude Code or GitHub Copilot on production codebases — especially anyone evaluating whether workflow or search tooling is worth the setup cost.
Your Token Budget Has Four Line Items
Before any tooling talk, it helps to have a mental model. Your per-session token budget breaks down into four categories:
| # | Line item | What drives it | The lever |
|---|---|---|---|
| 1 | Discovery | How you find code before changing it | Tool choice — rg vs ast-grep vs ck-search |
| 2 | MCP tool overhead | Fixed tax per turn for registered servers | Hygiene — only enable what you’ll use |
| 3 | Caching | How much of your context is served from cache | Session discipline — stay in sessions, protect the prefix |
| 4 | Output | What Claude generates | Precision — correct on first try |
That’s the whole game. Everything else is commentary on these four numbers.
Discovery is your biggest variable expense — and most developers never look at it. MCP overhead is a fixed per-turn tax that compounds silently. Caching is your biggest savings lever — and most developers accidentally break it. Output looks tiny in token volume but hits hardest in dollars per token.
Let’s go through each one.
Line Item 1: Discovery
Discovery is how Claude Code finds the code it needs before writing anything. It’s also the most surprising line item for most developers — because unlike output tokens, discovery tokens are invisible in the UI.
The primitives break down like this.
Read
Read injects the entire file into the conversation as a tool result. A 500-line TypeScript file costs roughly 3,000–5,000 tokens. Those tokens stay in conversation history for every
subsequent turn — not just the current one.
Read("src/auth/service.ts") → entire file injected into context
→ 500-line file ≈ 3,000–5,000 tokens
→ persists in ALL subsequent turns
Read is the most expensive primitive per call, and its cost compounds. Read ten files across the first five turns, and those tokens are being re-sent to the API at turn 25. The model isn’t “forgetting” earlier content — quite the opposite. The conversation history becomes a graveyard of file dumps that keeps growing the per-turn context.
Read is also the right tool when you know exactly which file you need. The issue isn’t Read — it’s reaching for Read before you know which file matters.
ripgrep (rg)
Grep runs ripgrep under the hood. Token cost depends on how many lines match and how much
surrounding context you request.
rg "authenticate" src/ # every matching line across every file
rg -C 3 "authenticate" src/ # matching lines + 3 lines before/after each
ripgrep has no concept of relevance. Every match is equal. A broad query on a brownfield
codebase is a token bomb: rg "error" . on a 50k LOC project returns 800+ lines — 4,000 to
8,000 tokens for a single tool call, all of it persisted in history.
The secondary cost is output tokens. The model receives all those matches and has to mentally filter them, burning output tokens to reason about noisy input before it can form a plan.
Glob
Glob is cheap — it returns only file paths, not contents.
glob("**/*.ts") → list of file paths
→ 200 files ≈ 400–800 tokens (just names)
Best used to discover what exists before committing to Read or rg. The “filenames first” pattern
(rg "processPayment" src/ -l) is the same idea applied to content search: confirm where something
lives before you load the whole file.
Bash
Arbitrary shell commands. Token cost equals the length of stdout/stderr. git log --oneline -20
is cheap. cat large_file.json is a Read in disguise.
Line Item 2: The MCP Registration Tax
Here’s what nobody tells you about MCP servers: you pay for them on every single turn, whether you use them or not.
Every API request includes the complete tool definitions for all registered MCP servers — the tool names, descriptions, parameters, and schema. This overhead is unavoidable and applies even before a single tool is called. The native Claude Code tools cost ~800 tokens per turn. Add ck-search and Serena and that jumps to ~3,700 tokens — on every turn, for the life of the session.
The Per-Turn MCP Tax
| MCPs active | Tool def tokens/turn | Amortized cost/turn (cached, after turn 1) |
|---|---|---|
| None (native only) | ~800 | $0.00024 |
| + ck-search (6 tools) | ~1,500 | $0.00045 |
| + Serena (~15 tools) | ~3,000 | $0.00090 |
| + Both | ~3,700 | $0.00111 |
Pricing: Sonnet 4.6 at $3.00/MTok input, $0.30/MTok cache read.
The amortized column is what you pay after the cache warms up — turn 2 onward. These numbers seem small in isolation, but the effect compounds.
A 30-turn session with both MCPs active but never called:
Cache write (turn 1): 3,700 × $3.75/MTok = $0.0139
Cache reads (turns 2–30): 29 × 3,700 × $0.30/MTok = $0.0322
Total wasted: $0.046 on tool definitions for MCPs you never used
Run five such sessions per day for a week:
$0.046 × 5 sessions × 5 days = $1.15/week in pure overhead
The rule is simple: an unused MCP is a tax on every turn.
The other MCP cost is latency. A slow cold LSP initialization (Serena’s language server starting up) can add extra reasoning turns while the model waits, generating unnecessary output tokens. Register MCPs per-project rather than globally so they’re only active when actually useful.
Line Item 3: Caching — Your Best Investment
Claude’s prompt caching is the single most impactful optimization in the entire system, and it largely runs automatically. If discovery is where you can overspend, caching is where you win it back.
How the Cache Works
The cache is keyed on exact prefix hashes. After turn 1, all prior context — system prompt, tool definitions, conversation history — is eligible to be served from cache:
sequenceDiagram
participant Ctx as Context
participant Cache
participant LLM
Note over Ctx,LLM: Turn 1 — Cold start (cache WRITE)
Ctx->>LLM: system + tools + user(1)
LLM-->>Cache: WRITE prefix hash
Note over Ctx,LLM: Turn 2 — 90% cache hit
Ctx->>Cache: system + tools + user(1)
Cache-->>Ctx: HIT — 90% savings on prefix
Ctx->>LLM: asst(1) + user(2) [new only]
LLM-->>Cache: WRITE new suffix
Note over Ctx,LLM: Turn 3 — Growing hit
Ctx->>Cache: system + tools + user(1) + asst(1)
Cache-->>Ctx: HIT
Ctx->>LLM: user(2) + asst(2) + user(3) [new only]
LLM-->>Cache: WRITE new suffix
Each turn, the cache hit grows. By turn 5, roughly 80% of context is being served from cache. By turn 10, that figure is closer to 87–90%.
The Pricing Math
| Token type | Price (Sonnet 4.6) | vs. regular input |
|---|---|---|
| Regular input | $3.00/MTok | 1× |
| Cache write (5-min TTL) | $3.75/MTok | 1.25× |
| Cache write (1-hr TTL) | $6.00/MTok | 2× |
| Cache read | $0.30/MTok | 0.1× |
| Output | $15.00/MTok | 5× input |
A 40,000-token context hitting cache costs $0.012 per turn instead of $0.12. Over a 30-turn session, that’s the difference between $3.60 and $0.36 on input costs alone.
Real session data confirms this: across 149 actual Claude Code sessions, 94.6% of all tokens were served from cache, saving $2,028 in input costs — sessions that would have cost roughly $2,475 without caching cost $247 with it.
The Cold Start Penalty
The cold start penalty is real and measurable in the empirical data:
| Turn count bucket | Sessions | Cache hit % | Avg cost/turn |
|---|---|---|---|
| 1–10 turns | 27 | 88.9% | 2.19¢ |
| 11–30 turns | 44 | 99.9% | 1.40¢ |
| 31–100 turns | 48 | 99.9% | 1.51¢ |
| 101–300 turns | 23 | 100.0% | 2.66¢ |
| 300+ turns | 4 | 100.0% | 2.76¢ |
Short sessions are the most expensive per turn. The 1–10 turn bucket has the fewest tokens per turn but the highest cost per turn — because it pays near-full price on almost everything. Once the cache warms after turn 3–5, the effective per-turn cost drops by 36%.
The 100+ turn buckets are expensive for a different reason: context has grown very large (89k–100k tokens/turn). Their 100% cache hit rates show caching is working perfectly — they’re just caching more content.
Two single-turn sessions from the same day on the same project illustrate the cold start penalty most starkly:
| Session | Turns | Tokens | Cache hit | Cost |
|---|---|---|---|---|
| A (cache cold) | 1 | 25,700 | 0.0% | $0.0566 |
| B (cache warm) | 1 | 16,200 | 100% | $0.0062 — 9× cheaper |
Same question. Same model. The only difference is whether the context was already cached.
What Breaks the Cache
| Change | Cache effect |
|---|---|
| New MCP registered mid-session | Invalidates from tool definitions forward |
| CLAUDE.md edited mid-session | Invalidates from system prompt |
| New turn added | Cache hit on all prior turns; only new content is fresh |
| tasks.md checkbox updated (Spec-Kit) | Partial invalidation on tasks prefix |
| Session gap > 5 minutes (default TTL) | Full cold start on next turn |
The session gap issue is subtle. Walk away for ten minutes mid-session and the next turn re-pays full input price on the entire cached history. For a 30k-token context, that’s $0.090 on one turn vs $0.009 with a warm cache.
Line Item 4: Output
Output tokens make up only 0.29–0.59% of total token volume in a typical session — but they account for a disproportionate share of the bill.
At $15/MTok, output costs 50× more per token than a cache read. A turn generating 3,000 tokens of code costs $0.045 in output alone — roughly equal to the cache-read cost of a 150,000-token context. Once your cache is warm and large, output generation becomes the dominant variable cost. It’s the line item that spikes when something goes wrong.
The only lever here is precision: ask clearly, provide the right context, and get it right the first time. The best search tools (coming up) help here indirectly — a model with a clean, precise context window generates better output on the first try and needs fewer correction turns.
How Codebase Size Multiplies Every Line Item
All four line items above are real in any session, but their proportions shift dramatically based on how large and how old your codebase is. For greenfield projects, most of this complexity evaporates.
Greenfield: Low Stakes, Flat Curve
On a greenfield project, the budget is almost boring. Context sources are limited to spec artifacts and files you just created. rg results are sparse. The token trajectory is flat and predictable. Cache hit rates are high because the system prompt (your CLAUDE.md or constitution) stays stable.
On a project with 20 files, rg "processPayment" src/ returns a few clean results. The model gets a clean signal. Context stays under 20k tokens for most sessions. rg is fine.
The only real greenfield risk is over-engineering your governance docs. A verbose CLAUDE.md adds 3,000+ tokens to every turn for no practical benefit. Keep it under 800 tokens.
Brownfield: Where All Four Line Items Get Expensive
In a brownfield codebase, every line item amplifies. Discovery is heavy because existing code matches everything. Reads are frequent because you need to understand patterns before you can change them. Token trajectory is steep early. And because discovery noise accumulates in context, it keeps raising your caching costs for the rest of the session — even after it’s stopped being relevant.
On a codebase with 40k lines of code, rg "auth" src/ -C 3 might return 8,000 tokens in a single call. That result sits in context for the next 40 turns, costing $0.30/MTok on every cache read even after it’s stopped being useful.
The Discovery Cost Problem
The comparison for understanding an existing auth system makes the scale of the problem concrete:
| Discovery approach | Token cost |
|---|---|
rg "auth" . -C 5 | 8,000–20,000 tokens |
| rg with 3–4 specific patterns | 3,000–8,000 tokens |
ck --sem "authentication" --threshold 0.7 --limit 10 | 400–800 tokens |
| Serena: find_symbol + get_references | 200–400 tokens |
| ck + Serena combined | 600–1,200 tokens |
The 20× token difference between naive rg and a ck/Serena combination isn’t just a turn-1 cost. Those extra 19,000 tokens stay in context for the rest of the session, costing $0.30/MTok on every subsequent turn’s cache read. In a 30-turn session, 19,000 excess context tokens add:
19,000 × $0.30/MTok × 30 turns = $0.171 in extra cache reads alone
That’s before accounting for the quality degradation from the model reasoning through noisy context.
The Brownfield Trap
The main brownfield anti-pattern is this sequence:
- Ask Claude to add a feature to an existing codebase
- Claude runs a few broad rg queries to understand the codebase
- Those queries return thousands of tokens of context — most of it not directly relevant
- The model plans and implements while carrying that noisy context
- By turn 15, the model is hallucinating or drifting because the noise-to-signal ratio is too high
- You spend 3–5 correction turns fixing the drift, each carrying that same noisy context
The cost of discovery noise isn’t just the tokens it injects — it’s every correction turn it causes.
Managing the Budget at Scale: Session Architecture
Here’s the thing about the four line items above: you can optimize each one individually, and you’ll still overspend if your session is badly architected.
Discovery noise from turn 2 is still in context at turn 50. Cache TTLs expire mid-work and your next turn re-pays full price on everything. Output quality degrades as the model reasons through accumulated noise. Each of these compounds the others. What starts as a $0.08 discovery cost at turn 2 becomes a $0.30+ drag by turn 30 — not because you paid more, but because you’re paying for that noise on every cached turn that follows.
This is the problem that GSD and Spec-Kit were designed to solve — not as productivity wrappers, but as architectural approaches to context budget management. Understanding them through that lens makes their trade-offs obvious.
GSD: A Fresh Budget Per Phase
GSD (get-shit-done) is built around one core idea: by the time you’re implementing, you shouldn’t be paying for your exploration history.
The architecture separates work into phases, each running as a fresh subagent:
flowchart LR
subgraph inputs["Persistent Inputs"]
PM[PROJECT.md]
RM[REQUIREMENTS.md]
RO[ROADMAP.md]
end
subgraph discuss["Discuss Phase
(fresh subagent)"]
D[/gsd-discuss-phase/]
end
subgraph plan["Plan Phase
(fresh subagent)"]
P[/gsd-plan-phase/]
end
subgraph execute["Execute Phase
(parallel fresh subagents)"]
E1[Task 1]
E2[Task 2]
EN[Task N]
end
PM & RM & RO --> D
D -->|CONTEXT.md| P
PM & RM --> P
P -->|PLAN.md XML| E1 & E2 & EN
PM --> E1 & E2 & EN
The execute subagent is the key. It starts with a clean context containing only what it needs for its specific task. It doesn’t carry the discussion history, the planning back-and-forth, or the codebase exploration noise from earlier phases.
This is a budget reset. Each phase pays its own cold start. Each phase’s discovery tokens never pollute the next phase’s context. The discuss phase can make broad rg queries without penalty, because that noise disappears before execute ever sees a token of it.
GSD’s XML plan format creates a stable prefix for Anthropic’s cache. The content before <tasks>
rarely changes, so parallel subagents executing different tasks from the same plan share cache
writes. High cache hit rate on re-runs of the same plan.
Built-in token optimizations:
- Prompt thinning (v1.36.0+): Automatically reduces prompt size for sub-200k models when context pressure rises
- Knowledge graph (
/gsd-graphify): Structured relationships instead of prose RESEARCH.md — lower token cost, higher information density - Wave execution: Plans run in parallel subagents with fresh contexts — no cross-contamination
discuss_mode: assumptions: For brownfield, reads your code instead of asking questions — eliminates question-round tokens
For brownfield specifically: Run /gsd-map-codebase first. It spawns parallel analysis
subagents to scan the codebase and write distilled findings to PROJECT.md. You pay the discovery
cost once; every subsequent phase reads a summary instead of re-discovering the codebase.
Real native-subagent session data (79 sessions, same architectural pattern as GSD execute phases):
Each subagent writes to cache once, then reads cheaply for 68 turns. The cold start write cost amortizes to $0.014/turn across the subagent’s lifetime.
GSD’s trade-off: multiple cold starts. Each phase spawns a fresh subagent, paying a cache write penalty. For large complex features, this is clearly worth it. For a tiny 3-file change, the cold start overhead isn’t justified. Think of it as: each cold start costs roughly $0.035 — if the phase isolation saves you more than that in avoided noise and correction turns, GSD pays for itself.
Spec-Kit: High-Signal Context by Design
Spec-Kit takes the opposite approach: a single long session with structured spec files that keep the model anchored to project requirements throughout. Instead of resetting the budget between phases, it fills the budget with purposeful signal.
Every /speckit.* command loads a stack of context files:
1. constitution.md ≈ 500–2,000 tokens (governance principles)
2. stack.md ≈ 300–800 tokens (tech stack facts)
3. spec.md ≈ 500–3,000 tokens (feature specification)
4. plan.md ≈ 1,000–5,000 tokens (implementation plan)
5. tasks.md ≈ 500–3,000 tokens (atomic task breakdown)
Cold start baseline: 3,000–14,000 tokens
All of this persists throughout the session. The spec infrastructure serves as a quality anchor — the model can’t drift from project requirements because those requirements are literally in every API call. This is especially valuable for team projects where constitutional constraints (code style, security requirements, architectural boundaries) need to be consistently enforced.
Think of it this way: instead of paying discovery costs every time the model needs to remember project conventions, you pay once at cold start and cache it for the whole session. The spec files aren’t overhead — they’re pre-paid, high-signal context that replaces expensive rediscovery.
The accumulation curve:
Turn 1: system + spec files loaded = 8,000 tokens
Turn 5: + 4 rg/read results = 14,000 tokens
Turn 10: + more reads + partial impls = 22,000 tokens
Turn 20: + full code files + test runs = 45,000 tokens
Turn 30: approaching context pressure
At high turn counts, the model starts summarizing earlier context — potential drift from original spec requirements. This is the “context rot” GSD solved architecturally by isolating phases. Spec-Kit’s answer is different: keep the signal high enough that noise never dominates.
Built-in mitigations:
speckit.optimize.tokens: Audits constitution and governance files for token bloatspeckit.memorylint.run: Prevents AGENTS.md from duplicating constitution.md contentspeckit.archive.run: After a feature merge, compresses spec artifacts into.specify/memory/speckit.cleanup.run: Post-implementation review that trims spec file cruft
Cache behavior: constitution.md loaded at turn 1 caches well for the first ~20 turns. Past
that, or after a 5-minute TTL expiry, an explicit cache breakpoint helps. tasks.md is the main
problem: checkboxes flip as tasks complete, breaking the cache prefix.
Load order matters: constitution → stack → spec → plan → tasks. Most stable files first, so the
stable prefix stays cached longest. Never load tasks.md early.
Spec-Kit’s trade-off: accumulation without discipline. The spec files grow as the project
evolves. Without speckit.archive.run after each merge, the project memory directory fills with
stale specs that bloat every future session. Constitution.md should be under 800 tokens — if it’s
growing, run speckit.optimize.tokens. The budget discipline has to be active, not passive.
The Write Cost Comparison
GSD and Spec-Kit also differ in how they write — and those writes affect cache behavior differently.
GSD writes at phase boundaries. CONTEXT.md (~1k tokens) and PLAN.md (~2.5k tokens) are generated as output at the end of one subagent and consumed as fresh inputs at the start of the next. No cache invalidation occurs because each subagent starts fresh. The real write cost is in output tokens ($15/MTok) to generate these artifacts — about $0.052 total. Crucially, CONTEXT.md is a distillation: it condenses 5,000 tokens of discuss history into 1,000 tokens of structured signal, saving downstream phases from carrying that noise.
Spec-Kit writes mid-session and across sessions. Within a single speckit implement run, tasks.md edits are tool results appended to the conversation — they don’t invalidate the prefix cache. Clean. But between invocations, the updated tasks.md breaks the cache prefix from that point forward: about $0.006 re-cache cost per new command invocation. Small per-invocation, but spec files also grow. A project 6 months in might load 14,000 tokens of spec infrastructure per command vs 6,800 at the start — doubling cold-start cost without anyone noticing.
| Workflow tool | Write artifacts | Growth profile |
|---|---|---|
| GSD | Ephemeral per-feature — CONTEXT.md + PLAN.md | Stays small (~1–3k tokens/feature) |
| Spec-Kit | Persistent spec files in repo | Grows with project history (6.8k early → 15k+ at scale) |
The Net Budget Impact
Both tools add overhead and remove waste. Whether you come out ahead depends entirely on session length and task complexity.
The overhead is fixed and immediate — it’s the price of admission:
GSD additional cost vs bare Claude:
3 cold starts + CONTEXT.md + PLAN.md output: +$0.107
Spec-Kit additional cost vs bare Claude:
Spec prefix (6.8k) + artifact generation: +$0.047
Per-turn MCP tax if ck+Serena active: +$0.002/turn
The savings compound with complexity. A correction turn at turn 10 (25k context) costs $0.030. A rework sequence at turn 20 (40k context) costs $0.171. Scoped prompts from PLAN.md generate roughly 1,000 fewer tokens per implementation turn than unscoped prompts — $0.015/turn in output savings, or $0.15 over 10 turns.
Breakeven (corrections prevented to recover overhead):
GSD: ~2 correction turns, or ~0.6 rework sequences
Spec-Kit: ~1 correction turn, or ~0.3 rework sequences
Here’s what that means by session length:
| < 5 turns | 5–15 turns | 15–30 turns | 30+ turns | |
|---|---|---|---|---|
| GSD | +$0.10+ overhead | ±$0.00 | −$0.05–$0.20 savings | −$0.20–$0.50 savings |
| Spec-Kit | +$0.05+ overhead | −$0.02–$0.08 savings | −$0.05–$0.15 savings | −$0.10–$0.30 savings |
+ = costs more than no tooling; − = costs less
In practice, Spec-Kit’s overhead is so low that a single prevented clarification turn pays for the whole session. GSD requires more task complexity to justify — typically brownfield features that would otherwise drift past turn 15. Neither tool makes sense for tasks under 5 turns.
The Budget Simulation: Four Approaches Head-to-Head
The scenario: add OAuth2 login to a brownfield commerce app (~40k LOC) that already has a custom auth system. This is exactly the kind of task where spending decisions diverge — meaningful discovery is required before implementation, and the tool choice at turn 2 shapes the entire budget from there.
Scenario: Add OAuth2 login to a brownfield commerce app (~40k LOC) with an existing custom auth system. Budget horizon: 10 turns. Model: Claude Sonnet 4.6.
Methodology: Token counts were derived from Claude Code’s actual tool result payloads at realistic codebase sizes (40k LOC, mixed TypeScript/Node.js). Costs use Sonnet 4.6 pricing ($3.00/MTok input, $3.75/MTok cache write, $0.30/MTok cache read, $15.00/MTok output). Empirical data is from 149 real sessions in ~/.claude/session-stats/sessions.csv, analyzed with SQLite.
Four approaches are simulated turn-by-turn:
| Approach | Tooling |
|---|---|
| A — rg + Read | Just Claude. Broad rg queries + whole-file reads. No workflow structure. |
| A’ — ast-grep | Same as A, but structural ast-grep queries replace broad rg calls. No MCPs. |
| B — GSD + Tools | GSD phases (discuss → plan → execute). ck-search + Serena in execute subagent. |
| C — Spec-Kit + Tools | Single Spec-Kit session. Spec files loaded. ck-search + Serena throughout. |
Approach A: rg + Read, No Tools
Discovery relies on broad ripgrep queries and whole-file reads. Everything accumulates in one growing context.
Turn Action New tok Cached Output Turn $ Running $
────────────────────────────────────────────────────────────────────────────────────────────
1 Cold start. System + CLAUDE.md (3k). 3,200 0 400 $0.018 $0.018
Ask: "help me add OAuth to our app"
[cache WRITE: 3,200 × $3.75/MTok]
2 rg "auth" . -C 3 → 8,000 token result 8,200 3,600 500 $0.040 $0.058
[rg returns every auth mention, all files]
[cache WRITE: 8,200 new tokens]
3 Read("src/auth/service.ts") → 4,500 tok 4,700 11,600 600 $0.032 $0.090
[whole file in context]
4 Read("src/auth/middleware.ts") → 3,200 tok 5,400 16,300 700 $0.034 $0.124
rg "passport|jwt" → 2,000 more tokens
[context is now full of auth noise]
5 Read("src/auth/strategies/") → 2,800 tok 3,000 21,700 800 $0.031 $0.155
[model now has ~24k tokens of raw auth content]
6 Plan OAuth approach (model reasons 200 24,500 1,500 $0.030 $0.185
through all the noise to form a plan)
7 Scaffold OAuth config 200 26,200 2,500 $0.047 $0.232
8 Implement passport strategy 200 28,900 3,000 $0.054 $0.286
9 Wire up routes 200 32,100 2,500 $0.048 $0.334
10 Write tests 200 34,800 2,000 $0.042 $0.376
────────────────────────────────────────────────────────────────────────────────────────────
TOTAL 25,500 34,800 14,500 $0.376
Context at turn 10: ~35,000 tokens — mostly rg noise from turns 2–5.
Discovery cost (turns 1–5): $0.137 — 36% of the total budget just to find things.
The model is reasoning through 8,000 tokens of rg noise every single turn from turn 2 onward. That noise never stops costing money, even once it’s stopped being useful.
Approach A’: ast-grep, No MCP
Same scenario, but structural ast-grep queries replace the broad rg calls. No MCP overhead, no semantic index — better tool choice alone.
Turn Action New tok Cached Output Turn $ Running $
───────────────────────────────────────────────────────────────────────────────────────────────────
1 Cold start. System + CLAUDE.md (3k). 3,200 0 400 $0.018 $0.018
Ask: "help me add OAuth to our app"
[cache WRITE: 3,200 × $3.75/MTok]
2 ast-grep --pattern 420 3,600 500 $0.015 $0.033
'class $NAME { $$$ }' src/auth/
+ ast-grep --pattern
'function $NAME(req, res, next) { $$$ }' src/
→ ~420 tokens (class/function signatures only)
[vs 8,000+ from rg "auth" . -C 3 in Approach A]
3 Read("src/auth/service.ts") → 4,500 tok 4,700 4,020 600 $0.025 $0.058
(ast-grep pointed us to exactly this file)
4 ast-grep --pattern '$OBJ.authenticate($$$)' src/ 300 8,720 700 $0.014 $0.072
+ ast-grep --pattern
'passport.use(new $STRATEGY($$$))' src/
→ ~300 tokens (call-sites only, no noise)
[vs rg "passport|jwt" → 2,000+ tokens in A]
5 ast-grep --pattern 'app.use($MIDDLEWARE)' src/ 250 12,020 800 $0.012 $0.084
→ ~250 tokens (middleware registrations only)
[vs Read("src/auth/strategies/") → 2,800 tok in A]
6 Plan OAuth approach (clean structural picture, 200 13,270 1,500 $0.021 $0.105
no raw text noise in context)
7 Scaffold OAuth config 200 14,970 2,500 $0.035 $0.140
8 Implement passport strategy 200 17,670 3,000 $0.047 $0.187
9 Wire up routes 200 20,870 2,500 $0.040 $0.227
10 Write tests 200 23,570 2,000 $0.036 $0.263
───────────────────────────────────────────────────────────────────────────────────────────────────
TOTAL 9,870 23,570 14,500 $0.263
Context at turn 10: ~24,000 tokens — 31% smaller than Approach A.
Discovery cost (turns 1–5): $0.039 — down from $0.137. Same information, 72% less cost.
MCP overhead added: $0.00.
ast-grep returned structure — class boundaries, call sites, middleware registrations — without injecting raw textual noise. The model saw exactly what mattered. The discovery tokens were signal, not noise.
Approach B: GSD + ck-search + Serena
GSD splits into fresh-context phases. The execute subagent starts clean with precision tools active, carrying none of the exploration history from earlier phases.
Turn Phase / Action New tok Cached Output Turn $ Running $
─────────────────────────────────────────────────────────────────────────────────────────────────
── DISCUSS SUBAGENT (fresh cold start) ──
1 PROJECT.md + REQUIREMENTS.md + CLAUDE.md 5,000 0 600 $0.028 $0.028
Cold start: "add OAuth, existing custom auth"
[cache WRITE: 5,000 × $3.75/MTok]
Output: scoped CONTEXT.md distillation
2 Refine scope, confirm assumptions 200 5,600 400 $0.008 $0.036
[cache READ: 5,600 × $0.30/MTok]
── PLAN SUBAGENT (fresh cold start) ──
3 PROJECT.md + CONTEXT.md + REQUIREMENTS.md 6,500 0 800 $0.036 $0.072
[FRESH context — discuss history gone]
Output: XML PLAN.md (3 tasks: config, strategy, routes)
4 Refine plan, confirm task breakdown 200 7,300 600 $0.012 $0.084
── EXECUTE SUBAGENT (fresh cold start, ck + Serena active) ──
5 PROJECT.md + PLAN.md + ck tools + Serena 8,000 0 400 $0.038 $0.122
Cold start. [FRESH context — 8k vs 24k in approach A]
ck --sem "authentication patterns" → 350 tok
Serena find_definition("AuthService") → 90 tok
[440 tokens of discovery vs 8,000+ from rg in approach A]
6 Serena find_references("AuthService") 200 8,840 600 $0.017 $0.139
Read(auth/service.ts lines 80-160 only) → 1,200 tok
[1,200 targeted tokens vs 4,500 for whole file]
7 Plan OAuth approach (clean context, 200 10,640 1,500 $0.026 $0.165
no noise — model has exactly what it needs)
8 Implement OAuth config + passport strategy 200 12,340 3,000 $0.048 $0.213
9 Wire up routes 200 15,540 2,500 $0.040 $0.253
10 Write tests 200 18,240 2,000 $0.034 $0.287
─────────────────────────────────────────────────────────────────────────────────────────────────
TOTAL 21,150 18,240 12,900 $0.287
Context at turn 10 (execute subagent): ~20,000 tokens — 43% smaller than Approach A.
Discovery cost: $0.017. ck found the relevant patterns in 350 tokens. Serena located the exact symbol definition. A targeted Read of 80 lines replaced a full 4,500-token file read.
MCP overhead: ~$0.025 across the session (tool definitions, cached after turn 1 of each subagent).
Approach C: Spec-Kit + ck-search + Serena
Single session. Spec files loaded at cold start. ck + Serena keep discovery tight from the first turn, while the spec infrastructure keeps the model anchored throughout.
Turn Action New tok Cached Output Turn $ Running $
──────────────────────────────────────────────────────────────────────────────────────────────
1 Cold start. Spec files + system + tools: 10,700 0 400 $0.046 $0.046
constitution(800) + stack(400) + spec(2k)
+ plan(1.5k) + tasks(800) + CLAUDE.md(3k)
+ ck tools(700) + Serena tools(1,500)
[Largest cold start of the 4 approaches]
2 ck --sem "authentication patterns" 350 11,100 500 $0.012 $0.058
--threshold 0.7 --limit 8
Returns: 8 × 150-char snippets = 350 tokens
[vs 8,000 from rg in Approach A]
3 Serena find_definition("AuthService") 200 11,950 400 $0.010 $0.068
Serena find_references("AuthService") → 250 tok
[~450 tokens vs reading 3 whole files]
4 Read(auth/service.ts lines 80–160 only) 1,200 12,550 600 $0.017 $0.085
[targeted section, not whole 4,500-tok file]
5 Plan OAuth approach (spec context keeps 200 14,350 1,000 $0.020 $0.105
model anchored to constitution + stack.md)
6 Implement OAuth config 200 15,550 2,000 $0.035 $0.140
7 Implement passport strategy 200 17,750 2,500 $0.043 $0.183
8 Wire up routes 200 20,450 2,000 $0.037 $0.220
9 Write tests 200 22,650 2,000 $0.037 $0.257
10 Update tasks.md (checkbox flip — partial 800 24,850 500 $0.016 $0.273
cache invalidation on tasks prefix)
──────────────────────────────────────────────────────────────────────────────────────────────
TOTAL 14,250 24,850 11,900 $0.273
Context at turn 10: ~26,000 tokens — but 7k of that is stable spec infrastructure, not noise. The spec files add overhead, but they’re cached and purposeful.
Discovery cost: $0.039. The large cold start is the spec file overhead, not discovery noise.
The Head-to-Head Summary
| A: rg+Read | A’: ast-grep | B: GSD+Tools | C: Spec-Kit+Tools | |
|---|---|---|---|---|
| Total (10 turns) | $0.376 | $0.263 | $0.287 | $0.273 |
| Savings vs A | — | −30% | −24% | −27% |
| Context at T10 | ~35,000 tok | ~24,000 tok | ~20,000 tok | ~26,000 tok |
| Signal/noise ratio | Low | High | Very High | High |
| Discovery cost (T1–5) | $0.137 | $0.039 | $0.017 | $0.039 |
| MCP overhead | $0.00 | $0.00 | ~$0.025 | ~$0.025 |
| Phase cold starts | 1 | 1 | 3 | 1 |
| Projected cost at T20 | ~$0.75+ | ~$0.51 | ~$0.55 | ~$0.50 |
| Projected cost at T50 | likely fails | ~$1.30 | ~$1.30 | ~$1.20 |
| Implementation quality risk | ⚠️ drifts T15+ | ✅ stable | ✅ isolated | ✅ spec-anchored |
The Key Budget Insight
A’ beats B at 10 turns despite B using precision semantic tools. Why?
ck-search’s MCP definition overhead costs roughly $0.025 across a 10-turn session ($0.00111/turn × 10 turns ≈ $0.011 in amortized cache reads, plus the initial cache write). ast-grep has zero overhead. When the codebase isn’t large enough for ck’s semantic index to massively outperform structural pattern matching, ast-grep wins on raw cost.
At turn 20+, B and C pull ahead. ck’s semantic savings compound as more of the codebase is explored, and the MCP overhead gets further amortized. In a 30-turn session on a large brownfield codebase, ck-search saves roughly 25,000 token-equivalents net after all overhead.
Where Each Dollar Goes
Looking at the cost breakdown by category makes the pattern obvious:
xychart-beta
title "Where Each Dollar Goes — 10-Turn Brownfield (Approach A/A'/B/C)"
x-axis ["A: rg+Read", "A': ast-grep", "B: GSD+ck", "C: Speckit+ck"]
y-axis "Cost ($)" 0 --> 0.40
bar [0.137, 0.039, 0.055, 0.039]
bar [0.030, 0.021, 0.084, 0.046]
bar [0.191, 0.158, 0.148, 0.188]
bar [0.080, 0.005, 0.005, 0.005]
Bar groups per approach: Discovery · Planning/Overhead · Implementation · Noise waste
| Approach | Discovery | Overhead | Impl | Noise | Total |
|---|---|---|---|---|---|
| A: rg + Read | $0.137 (36%) | $0.030 | $0.191 | ~$0.080 | $0.376 |
| A’: ast-grep | $0.039 (15%) | $0.021 | $0.158 | ~$0.005 | $0.263 |
| B: GSD + Tools | $0.055 (19%) | $0.084 | $0.148 | ~$0.005 | $0.287 |
| C: Spec-Kit + Tools | $0.039 (14%) | $0.046 | $0.188 | ~$0.005 | $0.273 |
The ast-grep Discovery Cheat Sheet
For structural queries, the token savings are dramatic enough to warrant a concrete spending comparison. These are the actual query substitutions from the OAuth brownfield scenario:
| Query goal | rg command | rg tokens | ast-grep equivalent | ast-grep tokens |
|---|---|---|---|---|
| All auth-related code | rg "auth" . -C 3 | 8,000+ | ast-grep 'class $N { $$$ }' src/auth/ | 420 |
| Find middleware functions | rg "function.*req.*res" | 2,000 | ast-grep 'function $N(req,res,next) { $$$ }' | 300 |
| Find passport call-sites | rg "passport|jwt" | 2,000 | ast-grep '$O.authenticate($$$)' | 300 |
| Find express route registrations | rg "app\.(get|post)" | 3,000 | ast-grep 'app.$M($PATH,$$$)' | 250 |
| Find try/catch blocks | rg -A5 "try {" | 5,000 | ast-grep 'try { $$$ } catch ($E) { $$$ }' | 400 |
| Combined | 20,000 | 1,670 |
88% fewer tokens for equivalent structural information. ast-grep isn’t a replacement for ck-search (no semantic understanding) or Serena (no symbol graph), but it’s the free upgrade you should always make before reaching for MCP tools.
Some useful structural patterns to keep handy:
# All async functions (finds both function declarations and arrow functions)
ast-grep --pattern 'async function $NAME($$$) { $$$ }' src/
# All React useEffect with dependency arrays
ast-grep --pattern 'useEffect(() => { $$$ }, [$$$])' src/
# All try-catch blocks (find error handling patterns)
ast-grep --pattern 'try { $$$ } catch ($E) { $$$ }' src/
# All class definitions in a directory
ast-grep --pattern 'class $NAME { $$$ }' src/auth/
# Express route handlers
ast-grep --pattern 'app.$METHOD($PATH, $$$)' src/routes/
What the Real Budget Data Shows
The theoretical models hold up against real data. 149 actual Claude Code sessions across three brownfield projects (commerce, leadingedje, jumpmind) validate the key claims.
Cache Is Delivering the Returns
Confirmed: the theoretical 90% cache savings holds in production. This is real money recovered from what would otherwise be a very different bill.
Cache Hit Rate Grows Over a Session
Session a3352099 (commerce feature, no extra MCPs) was captured at 11 incremental snapshots,
showing the cache warming effect in live numbers:
| Turn | Total tokens | Cache read % | Cost/turn |
|---|---|---|---|
| 4 | 73.9k | 56.1% | $0.02025 |
| 19 | 587.1k | 78.9% | $0.01952 |
| 22 | 701.2k | 82.2% | $0.01796 |
| 24 | 779.7k | 83.7% | $0.01764 |
| 35 | 1,290.4k | 86.2% | $0.01805 |
| 97 | 4,757.3k | 91.9% | $0.01752 |
| 108 | 5,552.6k | 92.8% | $0.01741 |
Cost per turn decreases from $0.020 to $0.017 as the session runs, even as total tokens grow 75×. The cache hit rate climbs from 56% to 92.8%. Cache savings outpaced context growth.
The spikes in cost between snapshots correlate with code-generation output bursts, not context accumulation. This is the output-cost paradox made visible: it’s the 0.29% of tokens that are output, not the 94%+ that are cached, driving cost during active coding.
Serena Reduces Total Spend, Not Per-Turn Rate
| Without Serena | With Serena | |
|---|---|---|
| Sessions | 131 | 12 |
| Avg tokens/turn | 40,071 | 34,215 (−15%) |
| Avg cost/turn | 1.78¢ | 1.81¢ |
| Avg cache hit % | 99.17% | 99.98% |
| Avg session cost | $1.79 | $1.02 (−43%) |
Per-turn cost is nearly identical — Serena’s overhead gets baked into the cached prefix. Total session cost is 43% lower because Serena sessions end sooner. Symbol-precision navigation finds what’s needed in fewer turns.
This validates the key point about how to think about these tools: Serena doesn’t save budget per call. It saves budget by shortening the total session. The right mental model is “fewer turns needed,” not “cheaper turns.”
The Extreme Case
The single most expensive session (9893d490, commerce project):
A 44-hour, 1,596-turn session. 100% cache hit rate throughout. Cache made this session practical at all. Without it, the bill would have been ten times larger. This is what good cache discipline looks like at the extreme end of the budget.
Conversational Sessions: A Different Budget
When you’re using Claude for research, Q&A, or explanation — not coding — the budget flips in interesting ways.
Data from 26 short Q&A-style sessions (1–15 turns, non-subagent) in the real dataset:
Average turns per session: 4.8
Average total tokens: 92,830
Average output %: 0.59% ← tiny relative to total
Average fresh input %: 0.01% ← essentially zero
Average cache hit rate: 88.5% ← lower than coding (cold start penalty)
Average cost per session: $0.078
Average cost per turn: 2.10¢
The standout number is output at 0.59%. In a typical Q&A session with 92k total tokens, only ~550 tokens are actual new output. The other 99.4% is cached infrastructure — system prompt, tool definitions, conversation history.
The Conversational Budget Is Inverted
In coding sessions, output dominates cost — generating 3,000 tokens of code at $15/MTok is expensive. In conversational sessions, cache writes dominate — the cold start infrastructure costs more than all the answers combined:
pie title Q&A Session Cost Structure (26 sessions, $3.47 total)
"Cache writes — cold start overhead" : 78
"Cache reads — conversation history" : 14
"Output — actual answers" : 7
"Fresh input — user messages" : 1
You are paying 11× more to set up the context than to get answers from it.
This is a direct consequence of the cold start penalty. In conversational sessions, the warm-up is most of the bill. Which means the budget advice is simple: don’t start new sessions for follow-up questions. Keep going in the same conversation.
Medium-length conversational sessions (16–50 turns) are the cheapest per turn at roughly 1.42¢ — the sweet spot where the cold start is fully amortized and context hasn’t grown large enough to dominate.
A 30-turn research conversation — exploring a topic, asking follow-ups, iterating on understanding — costs about $0.95 total. Under a dollar for a thorough deep-dive. Without caching, the same session would cost $6–9.
Your Token Budget Playbook
The decision isn’t “which tool is best.” It’s “which spending pattern fits my current situation.”
Discovery Spending: Choose the Right Tool for the Job
Use rg when: you know exactly what string or symbol you’re looking for.
rg "processPayment" src/ -l # filenames only first
rg "processPayment" src/payments/ # then scope to that directory
The anti-pattern: rg "error" . — returns thousands of lines from a brownfield codebase. Always
scope to a subdirectory. Always use -l first.
Use ast-grep when: you know the structure of what you’re looking for, not the exact name.
Structural refactors, finding all implementations of a code pattern, understanding code shape.
Zero MCP overhead, no setup required. This is your free upgrade from rg before reaching for MCPs.
Use ck-search when: you know the concept but not the symbol name. “Find where error handling
is centralized.” “Show me config loading patterns.” Works best in long brownfield sessions where
the MCP overhead is amortized and the semantic precision pays off repeatedly.
Optimal agent settings: --threshold 0.7 --limit 10 --snippet-length 150
Use Serena when: you have a specific symbol and need its full impact. Find all callers of
processPayment, map the AuthService inheritance tree, list every IRepository implementation.
Best ROI chained after ck: ck surfaces the symbol, Serena maps its graph.
Session Architecture: Choose the Right Budget Model
Use GSD when: the feature is complex enough that context rot is a real risk. More than 15 turns, multiple interdependent files, parallel workstreams. Phase isolation prevents accumulated noise from degrading quality in late-session turns. Think of each phase as its own budget envelope.
Use Spec-Kit when: you’re doing spec-driven work on a long-running team project and need the
model consistently anchored to project requirements. The spec files are pre-paid signal — they
replace expensive rediscovery at every session start. Run speckit.archive.run after every merge
to keep the budget lean.
Use nothing but native tools when: greenfield prototype, simple Q&A, fewer than 5 turns, or you already know exactly which files to change. No cold start overhead, no spec loading, just direct interaction.
The decision tree is roughly: Am I likely to drift past 15 turns in a brownfield codebase? → GSD. Am I doing ongoing feature work that needs consistent project context? → Spec-Kit. Everything else? → native tools.
MCP Budget: Enable/Disable by Session Type
| Situation | ck-search | Serena | Reason |
|---|---|---|---|
| New greenfield, < 20 files | OFF | OFF | Nothing to search yet |
| Brownfield, exact file known | OFF | OFF | Just Read it |
| Brownfield, concept exploration | ON | OFF | ck saves 5–10× on discovery |
| Brownfield, refactor/blast radius | ON | ON | Serena’s reference maps essential |
| Debugging with stack trace | OFF | OFF | You have the location already |
| Debugging with symptoms only | ON | OFF | ck for concept-based search |
| Short session (< 5 turns) | OFF | OFF | Neither breaks even without amortization |
| Active 20+ turn brownfield session | ON | ON | Full amortization active |
The cleanest implementation is a per-project .claude/mcp.json:
// .claude/mcp.json in a brownfield project
{
"mcpServers": {
"ck-search": { "command": "ck", "args": ["--serve"] },
"serena": { "command": "serena", "args": ["--stdio"] }
}
}
// .claude/mcp.json in a greenfield project — intentionally empty
{
"mcpServers": {}
}
MCPs auto-enable for the right projects and stay off everywhere else. No manual toggling required.
The Three Things That Move Your Budget Most
1. Discovery hygiene.
How you find code before changing it determines 30–40% of your session budget in brownfield scenarios. Broad rg queries inject thousands of tokens of noise that persist for the entire session. ast-grep, ck-search, and Serena exist specifically to replace that noise with signal. The 88% token reduction from ast-grep vs rg for structural queries — from 20,000 tokens to 1,670 — isn’t a rounding error. It’s real money at scale, and it compounds across every turn that carries that context.
2. Caching discipline.
The 90% input cost reduction from prompt caching is the biggest single lever in the system. Stay in sessions rather than restarting. Don’t edit CLAUDE.md mid-session. Don’t add MCPs mid-session. Don’t walk away for more than five minutes. Short, focused sessions are dramatically cheaper than interrupted ones. Real-world data confirms: 94.6% of all tokens were served from cache across 149 sessions, saving over $2,000 compared to what full-price input would have cost.
3. Session architecture.
Individual tool choices optimize turns. Session architecture optimizes the whole budget. GSD’s phase isolation means discovery noise from the discuss phase is gone before execution starts. Spec-Kit’s structured context means the model carries high-signal spec files instead of re-discovering project conventions from scratch. Both solve the same root problem — context accumulation and quality drift — from different architectural angles. Neither makes sense for short simple tasks. Both are clearly worth it once complexity crosses their breakeven threshold.
The real cost of Claude Code is not in the tokens it generates. It’s in what you put into context before asking it to generate anything.
Where to Start Spending Smarter
If you take nothing else from this: stop reaching for rg before you know what you’re looking for.
The single highest-ROI change on a brownfield codebase is replacing broad ripgrep queries with ast-grep for structural searches. No MCP setup. No configuration. Just install it and replace rg "function authenticate" with ast-grep -p 'function authenticate($$$)'. The context noise reduction is immediate — and free.
From there, think about your budget in layers:
- Discovery layer: Add ck-search if you’re doing semantic discovery across a large codebase. Add Serena if you need precise symbol navigation and call-graph traversal.
- Caching layer: Keep CLAUDE.md stable mid-session, don’t add MCPs ad hoc, stay in sessions instead of restarting.
- Architecture layer: Add GSD or Spec-Kit if you’re managing complex multi-phase tasks where context drift is a real risk. Think of them as budget envelopes, not just workflow tools.
The tools compound. Discovery hygiene at turn 1 reduces noise for every turn that follows. Clean architecture means phase budgets don’t bleed into each other. And once your cache is warm, the whole system gets cheaper by the turn.
That’s the budget. Spend it well.
Further Reading
- ck-search documentation — Semantic code search with a vector-indexed codebase
- Serena MCP — Language-server-based symbol navigation for Claude Code
- GSD — get-shit-done — Phase-isolated workflow tool for Claude Code
- Spec-Kit — Spec-driven development workflow for Claude Code and GitHub Copilot
- ast-grep — Structural code search and rewriting; no MCP required
- Anthropic prompt caching — Cache mechanics, TTL details, and prefix breakpoint guidance
Tools referenced: ck-search, Serena, GSD, Spec-Kit, ast-grep.
Pricing: Claude Sonnet 4.6 at $3.00/MTok input, $3.75/MTok cache write, $0.30/MTok cache read, $15.00/MTok output.
Empirical data from 149 sessions across three brownfield projects, 2026-03-27 through 2026-04-14.