What It Costs — Agent Mastered

Every Claude CLI call costs money — input tokens, output tokens, and cache operations all have a price. The budget flag checks between turns, not mid-generation, so a single expensive turn runs to completion before any limit kicks in. Understanding the minimum per-call overhead (~$0.016 for Opus) and cache economics (90% savings on repeated prompts) is what separates a predictable bill from a surprise.

Token Pricing

Token Type	Opus (per 1M)	Sonnet (per 1M)
Input tokens	$15.00	$3.00
Output tokens	$75.00	$15.00
5-min cache write	$18.75	$3.75
1-hour cache write	$30.00	$6.00
Cache read	$1.50 (90% off!)	$0.30 (90% off!)

Tip

Pricing changes. Check console.anthropic.com/settings/cost for current rates.

The cache read row is the one that matters most. Once a system prompt or conversation prefix is cached, subsequent calls read it back at one-tenth the input price. This is where the real savings come from.

The Minimum Per-Call Cost

Minimum Per-Call Cost

Every Opus call costs at least $0.016 — even “What is 1+1?” The system prompt alone is ~14,253 cache read tokens. This means —max-budget-usd 0.01 will always fail with Opus. Set a minimum of $0.02, or better yet, $0.10 to give Claude room to actually answer.

Every call pays a “tax” for the system prompt, regardless of what you actually ask:

Opus: ~$0.016 minimum per call
Sonnet: ~$0.005 minimum per call

The system prompt is approximately 14,253 cache read tokens. Even a trivial question like “What is 1+1?” costs $0.016 with Opus because those tokens must be read before Claude can respond.

Budget Is Not a Hard Cap

The --max-budget-usd flag is checked between turns, not mid-generation. Here is what actually happens:

Claude starts generating a response
The response completes (full turn)
Then the budget is checked
If exceeded, the next turn does not start

This means a single turn can blow past the budget. Set a $1 budget and ask for a complex refactor — Claude may generate a long response costing $2-3 before the budget check fires.

Gotcha

—max-budget-usd is NOT a hard cap. It is checked between turns, not mid-generation. A long first turn completes fully before any budget check runs. Set budgets at least 2-3x your expected per-turn cost to avoid surprises. Minimum practical budget for Opus is $0.10 — anything below $0.02 will always trigger a budget error from the system prompt alone.

Here is what happens when a budget is set below the minimum per-call floor:

Budget Exceeded — budget set below minimum floorartifacts/17/budget_overshoot.json

2 "type": "result",

3 "subtype": "error_max_budget_usd",← A

4 "duration_ms": 125999,

5 "is_error": false,

6 "num_turns": 1,

7 "session_id": "7a950bc0-0afb-4754-b66c-d65a5288dbd3",← B

8 "total_cost_usd": 0.1521415,

9 "modelUsage": {

10 "claude-opus-4-6": {

11 "inputTokens": 3,← C

12 "outputTokens": 5446,

13 "cacheReadInputTokens": 14253,

14 "cacheCreationInputTokens": 1416,

15 "costUSD": 0.1521415

16 }

17 }

18}

ANot 'success' — budget was exceeded

BActual cost: $0.15 — the system prompt floor alone exceeds a micro-budget

C5,446 output tokens generated before budget check fired

The subtype field changes from "success" to "error_max_budget_usd", but is_error remains false — this is a controlled stop, not a crash. Always check subtype rather than is_error to detect budget limits.

Budget OvershootBUDGET VISUALIZATION

Set budget limit

Budget checked HERE ↑
Between turns

$0.000

Budget

$0.050

Actual

$0.152

Ratio

3.04x

Tokens

5,446

▸ Try This

Test the budget floor yourself:

claude -p “What is 2+2?” —output-format json —max-budget-usd 0.10 | jq ‘{subtype, cost: .total_cost_usd}’

Check if subtype is “success” or “error_max_budget_usd”. Now try —max-budget-usd 0.01 — even a trivial question exceeds it because the system prompt floor is ~$0.016.

Cache Economics

Caching is where the real savings happen. Here is what a typical call looks like.

First call to a new session:

~14K tokens read from cache (system prompt, cached from a previous session)
~1.4K tokens written to cache (new session-specific content)

Subsequent calls (same session or same system prompt):

Almost everything hits cache reads at the 0.1x rate
Only new content creates cache entries

Savings calculation for a 10-turn Opus session:

	Without Caching	With Caching
Cost	10 x 15K input tokens x $15/M = $2.25	1 x 15K cache write + 9 x 15K cache reads = $0.65
Savings	—	71%

The takeaway: keep conversations in the same session whenever possible. Every resumed turn benefits from cache reads at 90% off.

Token Cost CalculatorINTERACTIVE CALCULATOR

Models

Input tokens10K

Output tokens1K

Turns per session5

Cache hit rate50%

Cache impact

Hit 50%Miss 50%

ESTIMATED COSTS

Opus 4.6

Per call

$0.0525

Per session (5 turns)

$0.26

Daily (10 sessions)

$2.63

5 Cost Optimization Strategies

Use Sonnet for simple tasks — Run claude -p "Format this JSON" --model claude-sonnet-4-6. Sonnet is ~40% cheaper per token ($3/$15 vs $5/$25 per MTok). Reserve Opus for complex reasoning.
Disable tools for pure text tasks — Run claude -p "Summarize this text" --tools "". No tool descriptions in the system prompt means fewer input tokens.
Use --effort low for simple questions — Run claude -p "What is 2+2?" --effort low. Lower effort means shorter thinking and fewer output tokens.
Set realistic budget caps — Minimum $0.10 for Opus single-turn, $5.00 for multi-turn sessions. Anything below $0.02 for Opus will always trigger a budget error.
Batch similar prompts to maximize cache hits — Run prompts with the same system prompt back-to-back. Subsequent calls benefit from cache reads at 90% off.

Industry Benchmarks

Typical usage patterns across teams:

Cost Benchmarks

Metric	Value
Average per dev/day	$6
90th percentile daily	$12
Monthly estimate	$100-200/dev
100-turn Opus (no cache)	$50-100
100-turn Opus (with cache)	$10-19

The difference between cached and uncached sessions is dramatic — up to 5x cheaper. This is why session management and cache awareness matter so much for teams running Claude at scale.

Tip

Track costs per call by piping to jq: claude -p ”…” —output-format json | jq ‘{cost: .total_cost_usd, tokens: .usage.output_tokens}’

30-40K Tokens Consumed Before Your First Prompt

Every session loads 30,000-40,000 tokens before you type anything — system prompt (~14K), tool definitions, CLAUDE.md files, and MCP server schemas. At Opus pricing, that is $0.02-0.06 per session just to start. For Pro plan users, this invisible startup cost burns 1-3% of your daily quota per session. Disable unused MCP servers and use —tools "" for pure text tasks to reduce this overhead.

See This in Action

See how a production system records per-review cost metrics (cost_usd, input_tokens, cache_read_tokens) and builds SQL analytics dashboards in Build an MR Reviewer, Part 6: Handle Errors and Track Costs.

→ Now Do This

Run claude -p “Hello” —output-format json —max-budget-usd 0.10 | jq ‘{cost: .total_cost_usd, cached: .usage.cache_read_input_tokens}’. Now you know your baseline cost per call. Run it twice — watch how cache_read_input_tokens jumps on the second call.