Skip to content

What It Costs

Understand token pricing, cache economics, and budget controls

7 min read
Token flow: input tokens fork into cache hit or miss paths, merge into processing, then output tokensInput TokensCache Hit90% off$3 → $0.30/MTokCache MissFull Price $3/MTokProcessingOutput Tokens

You set --max-budget-usd 0.001. Claude spends $0.152. That’s a 152x overshoot — and it’s by design. The budget check runs between turns, not mid-generation, so a single turn can blow past any limit you set.

Every Claude CLI call costs money — input tokens, output tokens, and cache operations all have a price. Understanding the minimum per-call overhead (~$0.016 for Opus) and cache economics (90% savings on repeated prompts) is what separates a predictable bill from a surprise.

Token Pricing

Token Pricing (March 2026)

Token TypeOpus (per 1M)Sonnet (per 1M)
Input tokens$15.00$3.00
Output tokens$75.00$15.00
5-min cache write$18.75$3.75
1-hour cache write$30.00$6.00
Cache read$1.50 (90% off!)$0.30 (90% off!)

The cache read row is the one that matters most. Once a system prompt or conversation prefix is cached, subsequent calls read it back at one-tenth the input price. This is where the real savings come from.

The Minimum Per-Call Cost

Every call pays a “tax” for the system prompt, regardless of what you actually ask:

  • Opus: ~$0.016 minimum per call
  • Sonnet: ~$0.005 minimum per call

The system prompt is approximately 14,253 cache read tokens. Even a trivial question like “What is 1+1?” costs $0.016 with Opus because those tokens must be read before Claude can respond.

This has a direct consequence: --max-budget-usd 0.01 will always trigger error_max_budget_usd with Opus because the system prompt alone exceeds the budget. Set a minimum of $0.02 for Opus, or better yet, $0.10 to give Claude room to actually answer.

Budget Is Not a Hard Cap

The --max-budget-usd flag is checked between turns, not mid-generation. Here is what actually happens:

  1. Claude starts generating a response
  2. The response completes (full turn)
  3. Then the budget is checked
  4. If exceeded, the next turn does not start

This means a single turn can blow past the budget by any amount. A $0.001 budget can result in $0.15 of spend if the first turn generates a long response.

Gotcha

—max-budget-usd is NOT a hard cap. It is checked between turns, not mid-generation. A single turn can exceed the budget by 100x or more. If you set a budget of $0.001 and ask for a 5,000-word essay, Claude will write the entire essay before checking the budget. Always set budgets at least 10x your expected per-turn cost.

Here is a real response from setting a $0.001 budget and asking for a long essay:

Budget Exceeded — $0.001 budget, $0.152 actualartifacts/17/budget_overshoot.json
1{
2 "type": "result",
3 "subtype": "error_max_budget_usd",A
4 "duration_ms": 125999,
5 "is_error": false,
6 "num_turns": 1,
7 "session_id": "7a950bc0-0afb-4754-b66c-d65a5288dbd3",B
8 "total_cost_usd": 0.1521415,
9 "modelUsage": {
10 "claude-opus-4-6": {
11 "inputTokens": 3,C
12 "outputTokens": 5446,
13 "cacheReadInputTokens": 14253,
14 "cacheCreationInputTokens": 1416,
15 "costUSD": 0.1521415
16 }
17 }
18}
ANot 'success' — budget was exceeded
B152x the $0.001 budget — single turn can overshoot by any amount
C5,446 tokens generated before budget check

The subtype field changes from "success" to "error_max_budget_usd", but is_error remains false — this is a controlled stop, not a crash. Always check subtype rather than is_error to detect budget limits.

Budget OvershootBUDGET VISUALIZATION
Budget checked HERE ↑
Between turns
$0.000
Budget
$0.001
Actual
$0.152
Ratio
152x
Tokens
5,446
Try This

Set the smallest possible budget and see what happens:

claude -p “What is 2+2?” —output-format json —max-budget-usd 0.001 | jq ‘{subtype, cost: .total_cost_usd}’

Did it succeed or fail? Check if subtype is “success” or “error_max_budget_usd”. Try again with —max-budget-usd 0.10 — what’s the minimum budget that actually works with Opus?

Cache Economics

Caching is where the real savings happen. Here is what a typical call looks like under the hood.

First call to a new session:

  • ~14K tokens read from cache (system prompt, cached from a previous session)
  • ~1.4K tokens written to cache (new session-specific content)

Subsequent calls (same session or same system prompt):

  • Almost everything hits cache reads at the 0.1x rate
  • Only new content creates cache entries

Savings calculation for a 10-turn Opus session:

Without CachingWith Caching
Cost10 x 15K input tokens x $15/M = $2.251 x 15K cache write + 9 x 15K cache reads = $0.65
Savings71%

The takeaway: keep conversations in the same session whenever possible. Every resumed turn benefits from cache reads at 90% off.

Token Cost CalculatorINTERACTIVE CALCULATOR
10K
1K
5
50%
Hit 50%Miss 50%
ESTIMATED COSTS
Opus
Per call
$0.1575
Per session (5 turns)
$0.79
Daily (10 sessions)
$7.88

5 Cost Optimization Strategies

  1. Use Sonnet for simple tasks — Run claude -p "Format this JSON" --model claude-sonnet-4-6. Sonnet is 5x cheaper per token. Reserve Opus for complex reasoning.

  2. Disable tools for pure text tasks — Run claude -p "Summarize this text" --tools "". No tool descriptions in the system prompt means fewer input tokens.

  3. Use --effort low for simple questions — Run claude -p "What is 2+2?" --effort low. Lower effort means shorter thinking and fewer output tokens.

  4. Set realistic budget caps — Minimum $0.10 for Opus single-turn, $5.00 for multi-turn sessions. Anything below $0.02 for Opus will always trigger a budget error.

  5. Batch similar prompts to maximize cache hits — Run prompts with the same system prompt back-to-back. Subsequent calls benefit from cache reads at 90% off.

Industry Benchmarks

Cost Benchmarks

MetricValue
Average per dev/day$6
90th percentile daily$12
Monthly estimate$100-200/dev
100-turn Opus (no cache)$50-100
100-turn Opus (with cache)$10-19

The difference between cached and uncached sessions is dramatic — up to 5x cheaper. This is why session management and cache awareness matter so much for teams running Claude at scale.

Tip

Track costs per call by piping to jq: claude -p ”…” —output-format json | jq ‘{cost: .total_cost_usd, tokens: .usage.output_tokens}’

30-40K Tokens Consumed Before Your First Prompt

Every session loads 30,000-40,000 tokens before you type anything — system prompt (~14K), tool definitions, CLAUDE.md files, and MCP server schemas. At Opus pricing, that is $0.02-0.06 per session just to start. For Pro plan users, this invisible startup cost burns 1-3% of your daily quota per session. Disable unused MCP servers and use —tools "" for pure text tasks to reduce this overhead.

See This in Action

See how a production system records per-review cost metrics (cost_usd, input_tokens, cache_read_tokens) and builds SQL analytics dashboards in Build an MR Reviewer, Part 6: Handle Errors and Track Costs.

Now Do This

Run claude -p “Hello” —output-format json —max-budget-usd 0.10 | jq ‘{cost: .total_cost_usd, cached: .usage.cache_read_input_tokens}’. Now you know your baseline cost per call. Run it twice — watch how cache_read_input_tokens jumps on the second call.