You set --max-budget-usd 0.001. Claude spends $0.152. That’s a 152x overshoot — and it’s by design. The budget check runs between turns, not mid-generation, so a single turn can blow past any limit you set.
Every Claude CLI call costs money — input tokens, output tokens, and cache operations all have a price. Understanding the minimum per-call overhead (~$0.016 for Opus) and cache economics (90% savings on repeated prompts) is what separates a predictable bill from a surprise.
Token Pricing
Token Pricing (March 2026)
| Token Type | Opus (per 1M) | Sonnet (per 1M) |
|---|---|---|
| Input tokens | $15.00 | $3.00 |
| Output tokens | $75.00 | $15.00 |
| 5-min cache write | $18.75 | $3.75 |
| 1-hour cache write | $30.00 | $6.00 |
| Cache read | $1.50 (90% off!) | $0.30 (90% off!) |
The cache read row is the one that matters most. Once a system prompt or conversation prefix is cached, subsequent calls read it back at one-tenth the input price. This is where the real savings come from.
The Minimum Per-Call Cost
Every call pays a “tax” for the system prompt, regardless of what you actually ask:
- Opus: ~$0.016 minimum per call
- Sonnet: ~$0.005 minimum per call
The system prompt is approximately 14,253 cache read tokens. Even a trivial question like “What is 1+1?” costs $0.016 with Opus because those tokens must be read before Claude can respond.
This has a direct consequence: --max-budget-usd 0.01 will always trigger error_max_budget_usd with Opus because the system prompt alone exceeds the budget. Set a minimum of $0.02 for Opus, or better yet, $0.10 to give Claude room to actually answer.
Budget Is Not a Hard Cap
The --max-budget-usd flag is checked between turns, not mid-generation. Here is what actually happens:
- Claude starts generating a response
- The response completes (full turn)
- Then the budget is checked
- If exceeded, the next turn does not start
This means a single turn can blow past the budget by any amount. A $0.001 budget can result in $0.15 of spend if the first turn generates a long response.
—max-budget-usd is NOT a hard cap. It is checked between turns, not mid-generation. A single turn can exceed the budget by 100x or more. If you set a budget of $0.001 and ask for a 5,000-word essay, Claude will write the entire essay before checking the budget. Always set budgets at least 10x your expected per-turn cost.
Here is a real response from setting a $0.001 budget and asking for a long essay:
The subtype field changes from "success" to "error_max_budget_usd", but is_error remains false — this is a controlled stop, not a crash. Always check subtype rather than is_error to detect budget limits.
Between turns
Set the smallest possible budget and see what happens:
claude -p “What is 2+2?” —output-format json —max-budget-usd 0.001 | jq ‘{subtype, cost: .total_cost_usd}’
Did it succeed or fail? Check if subtype is “success” or “error_max_budget_usd”. Try again with —max-budget-usd 0.10 — what’s the minimum budget that actually works with Opus?
Cache Economics
Caching is where the real savings happen. Here is what a typical call looks like under the hood.
First call to a new session:
- ~14K tokens read from cache (system prompt, cached from a previous session)
- ~1.4K tokens written to cache (new session-specific content)
Subsequent calls (same session or same system prompt):
- Almost everything hits cache reads at the 0.1x rate
- Only new content creates cache entries
Savings calculation for a 10-turn Opus session:
| Without Caching | With Caching | |
|---|---|---|
| Cost | 10 x 15K input tokens x $15/M = $2.25 | 1 x 15K cache write + 9 x 15K cache reads = $0.65 |
| Savings | — | 71% |
The takeaway: keep conversations in the same session whenever possible. Every resumed turn benefits from cache reads at 90% off.
5 Cost Optimization Strategies
-
Use Sonnet for simple tasks — Run
claude -p "Format this JSON" --model claude-sonnet-4-6. Sonnet is 5x cheaper per token. Reserve Opus for complex reasoning. -
Disable tools for pure text tasks — Run
claude -p "Summarize this text" --tools "". No tool descriptions in the system prompt means fewer input tokens. -
Use
--effort lowfor simple questions — Runclaude -p "What is 2+2?" --effort low. Lower effort means shorter thinking and fewer output tokens. -
Set realistic budget caps — Minimum $0.10 for Opus single-turn, $5.00 for multi-turn sessions. Anything below $0.02 for Opus will always trigger a budget error.
-
Batch similar prompts to maximize cache hits — Run prompts with the same system prompt back-to-back. Subsequent calls benefit from cache reads at 90% off.
Industry Benchmarks
Cost Benchmarks
| Metric | Value |
|---|---|
| Average per dev/day | $6 |
| 90th percentile daily | $12 |
| Monthly estimate | $100-200/dev |
| 100-turn Opus (no cache) | $50-100 |
| 100-turn Opus (with cache) | $10-19 |
The difference between cached and uncached sessions is dramatic — up to 5x cheaper. This is why session management and cache awareness matter so much for teams running Claude at scale.
Track costs per call by piping to jq: claude -p ”…” —output-format json | jq ‘{cost: .total_cost_usd, tokens: .usage.output_tokens}’
Every session loads 30,000-40,000 tokens before you type anything — system prompt (~14K), tool definitions, CLAUDE.md files, and MCP server schemas. At Opus pricing, that is $0.02-0.06 per session just to start. For Pro plan users, this invisible startup cost burns 1-3% of your daily quota per session. Disable unused MCP servers and use —tools "" for pure text tasks to reduce this overhead.
See how a production system records per-review cost metrics (cost_usd, input_tokens, cache_read_tokens) and builds SQL analytics dashboards in Build an MR Reviewer, Part 6: Handle Errors and Track Costs.
Run claude -p “Hello” —output-format json —max-budget-usd 0.10 | jq ‘{cost: .total_cost_usd, cached: .usage.cache_read_input_tokens}’. Now you know your baseline cost per call. Run it twice — watch how cache_read_input_tokens jumps on the second call.