Every Claude CLI call costs money — input tokens, output tokens, and cache operations all have a price. The budget flag checks between turns, not mid-generation, so a single expensive turn runs to completion before any limit kicks in. Understanding the minimum per-call overhead (~$0.016 for Opus) and cache economics (90% savings on repeated prompts) is what separates a predictable bill from a surprise.
Token Pricing
Token Pricing
| Token Type | Opus (per 1M) | Sonnet (per 1M) |
|---|---|---|
| Input tokens | $15.00 | $3.00 |
| Output tokens | $75.00 | $15.00 |
| 5-min cache write | $18.75 | $3.75 |
| 1-hour cache write | $30.00 | $6.00 |
| Cache read | $1.50 (90% off!) | $0.30 (90% off!) |
Pricing changes. Check console.anthropic.com/settings/cost for current rates.
The cache read row is the one that matters most. Once a system prompt or conversation prefix is cached, subsequent calls read it back at one-tenth the input price. This is where the real savings come from.
The Minimum Per-Call Cost
Every Opus call costs at least $0.016 — even “What is 1+1?” The system prompt alone is ~14,253 cache read tokens. This means —max-budget-usd 0.01 will always fail with Opus. Set a minimum of $0.02, or better yet, $0.10 to give Claude room to actually answer.
Every call pays a “tax” for the system prompt, regardless of what you actually ask:
- Opus: ~$0.016 minimum per call
- Sonnet: ~$0.005 minimum per call
The system prompt is approximately 14,253 cache read tokens. Even a trivial question like “What is 1+1?” costs $0.016 with Opus because those tokens must be read before Claude can respond.
Budget Is Not a Hard Cap
The --max-budget-usd flag is checked between turns, not mid-generation. Here is what actually happens:
- Claude starts generating a response
- The response completes (full turn)
- Then the budget is checked
- If exceeded, the next turn does not start
This means a single turn can blow past the budget. Set a $1 budget and ask for a complex refactor — Claude may generate a long response costing $2-3 before the budget check fires.
—max-budget-usd is NOT a hard cap. It is checked between turns, not mid-generation. A long first turn completes fully before any budget check runs. Set budgets at least 2-3x your expected per-turn cost to avoid surprises. Minimum practical budget for Opus is $0.10 — anything below $0.02 will always trigger a budget error from the system prompt alone.
Here is what happens when a budget is set below the minimum per-call floor:
The subtype field changes from "success" to "error_max_budget_usd", but is_error remains false — this is a controlled stop, not a crash. Always check subtype rather than is_error to detect budget limits.
Between turns
Test the budget floor yourself:
claude -p “What is 2+2?” —output-format json —max-budget-usd 0.10 | jq ‘{subtype, cost: .total_cost_usd}’
Check if subtype is “success” or “error_max_budget_usd”. Now try —max-budget-usd 0.01 — even a trivial question exceeds it because the system prompt floor is ~$0.016.
Cache Economics
Caching is where the real savings happen. Here is what a typical call looks like.
First call to a new session:
- ~14K tokens read from cache (system prompt, cached from a previous session)
- ~1.4K tokens written to cache (new session-specific content)
Subsequent calls (same session or same system prompt):
- Almost everything hits cache reads at the 0.1x rate
- Only new content creates cache entries
Savings calculation for a 10-turn Opus session:
| Without Caching | With Caching | |
|---|---|---|
| Cost | 10 x 15K input tokens x $15/M = $2.25 | 1 x 15K cache write + 9 x 15K cache reads = $0.65 |
| Savings | — | 71% |
The takeaway: keep conversations in the same session whenever possible. Every resumed turn benefits from cache reads at 90% off.
5 Cost Optimization Strategies
-
Use Sonnet for simple tasks — Run
claude -p "Format this JSON" --model claude-sonnet-4-6. Sonnet is ~40% cheaper per token ($3/$15 vs $5/$25 per MTok). Reserve Opus for complex reasoning. -
Disable tools for pure text tasks — Run
claude -p "Summarize this text" --tools "". No tool descriptions in the system prompt means fewer input tokens. -
Use
--effort lowfor simple questions — Runclaude -p "What is 2+2?" --effort low. Lower effort means shorter thinking and fewer output tokens. -
Set realistic budget caps — Minimum $0.10 for Opus single-turn, $5.00 for multi-turn sessions. Anything below $0.02 for Opus will always trigger a budget error.
-
Batch similar prompts to maximize cache hits — Run prompts with the same system prompt back-to-back. Subsequent calls benefit from cache reads at 90% off.
Industry Benchmarks
Typical usage patterns across teams:
Cost Benchmarks
| Metric | Value |
|---|---|
| Average per dev/day | $6 |
| 90th percentile daily | $12 |
| Monthly estimate | $100-200/dev |
| 100-turn Opus (no cache) | $50-100 |
| 100-turn Opus (with cache) | $10-19 |
The difference between cached and uncached sessions is dramatic — up to 5x cheaper. This is why session management and cache awareness matter so much for teams running Claude at scale.
Track costs per call by piping to jq: claude -p ”…” —output-format json | jq ‘{cost: .total_cost_usd, tokens: .usage.output_tokens}’
Every session loads 30,000-40,000 tokens before you type anything — system prompt (~14K), tool definitions, CLAUDE.md files, and MCP server schemas. At Opus pricing, that is $0.02-0.06 per session just to start. For Pro plan users, this invisible startup cost burns 1-3% of your daily quota per session. Disable unused MCP servers and use —tools "" for pure text tasks to reduce this overhead.
See how a production system records per-review cost metrics (cost_usd, input_tokens, cache_read_tokens) and builds SQL analytics dashboards in Build an MR Reviewer, Part 6: Handle Errors and Track Costs.
Run claude -p “Hello” —output-format json —max-budget-usd 0.10 | jq ‘{cost: .total_cost_usd, cached: .usage.cache_read_input_tokens}’. Now you know your baseline cost per call. Run it twice — watch how cache_read_input_tokens jumps on the second call.