Skip to content

What It Costs

Understand token pricing, cache economics, and budget controls

7 min read
Token flow: input tokens fork into cache hit or miss paths, merge into processing, then output tokensInput TokensCache Hit90% off$3 → $0.30/MTokCache MissFull Price $3/MTokProcessingOutput Tokens

Every Claude CLI call costs money — input tokens, output tokens, and cache operations all have a price. The budget flag checks between turns, not mid-generation, so a single expensive turn runs to completion before any limit kicks in. Understanding the minimum per-call overhead (~$0.016 for Opus) and cache economics (90% savings on repeated prompts) is what separates a predictable bill from a surprise.

Token Pricing

Token Pricing

Token TypeOpus (per 1M)Sonnet (per 1M)
Input tokens$15.00$3.00
Output tokens$75.00$15.00
5-min cache write$18.75$3.75
1-hour cache write$30.00$6.00
Cache read$1.50 (90% off!)$0.30 (90% off!)
Tip

Pricing changes. Check console.anthropic.com/settings/cost for current rates.

The cache read row is the one that matters most. Once a system prompt or conversation prefix is cached, subsequent calls read it back at one-tenth the input price. This is where the real savings come from.

The Minimum Per-Call Cost

Minimum Per-Call Cost

Every Opus call costs at least $0.016 — even “What is 1+1?” The system prompt alone is ~14,253 cache read tokens. This means —max-budget-usd 0.01 will always fail with Opus. Set a minimum of $0.02, or better yet, $0.10 to give Claude room to actually answer.

Every call pays a “tax” for the system prompt, regardless of what you actually ask:

  • Opus: ~$0.016 minimum per call
  • Sonnet: ~$0.005 minimum per call

The system prompt is approximately 14,253 cache read tokens. Even a trivial question like “What is 1+1?” costs $0.016 with Opus because those tokens must be read before Claude can respond.

Budget Is Not a Hard Cap

The --max-budget-usd flag is checked between turns, not mid-generation. Here is what actually happens:

  1. Claude starts generating a response
  2. The response completes (full turn)
  3. Then the budget is checked
  4. If exceeded, the next turn does not start

This means a single turn can blow past the budget. Set a $1 budget and ask for a complex refactor — Claude may generate a long response costing $2-3 before the budget check fires.

Gotcha

—max-budget-usd is NOT a hard cap. It is checked between turns, not mid-generation. A long first turn completes fully before any budget check runs. Set budgets at least 2-3x your expected per-turn cost to avoid surprises. Minimum practical budget for Opus is $0.10 — anything below $0.02 will always trigger a budget error from the system prompt alone.

Here is what happens when a budget is set below the minimum per-call floor:

Budget Exceeded — budget set below minimum floorartifacts/17/budget_overshoot.json
1{
2 "type": "result",
3 "subtype": "error_max_budget_usd",A
4 "duration_ms": 125999,
5 "is_error": false,
6 "num_turns": 1,
7 "session_id": "7a950bc0-0afb-4754-b66c-d65a5288dbd3",B
8 "total_cost_usd": 0.1521415,
9 "modelUsage": {
10 "claude-opus-4-6": {
11 "inputTokens": 3,C
12 "outputTokens": 5446,
13 "cacheReadInputTokens": 14253,
14 "cacheCreationInputTokens": 1416,
15 "costUSD": 0.1521415
16 }
17 }
18}
ANot 'success' — budget was exceeded
BActual cost: $0.15 — the system prompt floor alone exceeds a micro-budget
C5,446 output tokens generated before budget check fired

The subtype field changes from "success" to "error_max_budget_usd", but is_error remains false — this is a controlled stop, not a crash. Always check subtype rather than is_error to detect budget limits.

Budget OvershootBUDGET VISUALIZATION
Budget checked HERE ↑
Between turns
$0.000
Budget
$0.050
Actual
$0.152
Ratio
3.04x
Tokens
5,446
Try This

Test the budget floor yourself:

claude -p “What is 2+2?” —output-format json —max-budget-usd 0.10 | jq ‘{subtype, cost: .total_cost_usd}’

Check if subtype is “success” or “error_max_budget_usd”. Now try —max-budget-usd 0.01 — even a trivial question exceeds it because the system prompt floor is ~$0.016.

Cache Economics

Caching is where the real savings happen. Here is what a typical call looks like.

First call to a new session:

  • ~14K tokens read from cache (system prompt, cached from a previous session)
  • ~1.4K tokens written to cache (new session-specific content)

Subsequent calls (same session or same system prompt):

  • Almost everything hits cache reads at the 0.1x rate
  • Only new content creates cache entries

Savings calculation for a 10-turn Opus session:

Without CachingWith Caching
Cost10 x 15K input tokens x $15/M = $2.251 x 15K cache write + 9 x 15K cache reads = $0.65
Savings71%

The takeaway: keep conversations in the same session whenever possible. Every resumed turn benefits from cache reads at 90% off.

Token Cost CalculatorINTERACTIVE CALCULATOR
10K
1K
5
50%
Hit 50%Miss 50%
ESTIMATED COSTS
Opus 4.6
Per call
$0.0525
Per session (5 turns)
$0.26
Daily (10 sessions)
$2.63

5 Cost Optimization Strategies

  1. Use Sonnet for simple tasks — Run claude -p "Format this JSON" --model claude-sonnet-4-6. Sonnet is ~40% cheaper per token ($3/$15 vs $5/$25 per MTok). Reserve Opus for complex reasoning.

  2. Disable tools for pure text tasks — Run claude -p "Summarize this text" --tools "". No tool descriptions in the system prompt means fewer input tokens.

  3. Use --effort low for simple questions — Run claude -p "What is 2+2?" --effort low. Lower effort means shorter thinking and fewer output tokens.

  4. Set realistic budget caps — Minimum $0.10 for Opus single-turn, $5.00 for multi-turn sessions. Anything below $0.02 for Opus will always trigger a budget error.

  5. Batch similar prompts to maximize cache hits — Run prompts with the same system prompt back-to-back. Subsequent calls benefit from cache reads at 90% off.

Industry Benchmarks

Typical usage patterns across teams:

Cost Benchmarks

MetricValue
Average per dev/day$6
90th percentile daily$12
Monthly estimate$100-200/dev
100-turn Opus (no cache)$50-100
100-turn Opus (with cache)$10-19

The difference between cached and uncached sessions is dramatic — up to 5x cheaper. This is why session management and cache awareness matter so much for teams running Claude at scale.

Tip

Track costs per call by piping to jq: claude -p ”…” —output-format json | jq ‘{cost: .total_cost_usd, tokens: .usage.output_tokens}’

30-40K Tokens Consumed Before Your First Prompt

Every session loads 30,000-40,000 tokens before you type anything — system prompt (~14K), tool definitions, CLAUDE.md files, and MCP server schemas. At Opus pricing, that is $0.02-0.06 per session just to start. For Pro plan users, this invisible startup cost burns 1-3% of your daily quota per session. Disable unused MCP servers and use —tools "" for pure text tasks to reduce this overhead.

See This in Action

See how a production system records per-review cost metrics (cost_usd, input_tokens, cache_read_tokens) and builds SQL analytics dashboards in Build an MR Reviewer, Part 6: Handle Errors and Track Costs.

Now Do This

Run claude -p “Hello” —output-format json —max-budget-usd 0.10 | jq ‘{cost: .total_cost_usd, cached: .usage.cache_read_input_tokens}’. Now you know your baseline cost per call. Run it twice — watch how cache_read_input_tokens jumps on the second call.