The Test
I ran this command expecting Claude to stop after spending a tenth of a cent:
claude -p "Write a 5000 word essay about the history of computing" \ --output-format json \ --max-budget-usd 0.001Two minutes later, the response came back. Claude had written the entire essay. The bill: $0.152. That is 152 times the budget I set.
Why This Happens
The --max-budget-usd flag is not a hard cap. It is a between-turn check. Here is the actual enforcement sequence:
- Claude starts generating a response
- The full response completes (one turn)
- Then the budget is checked
- If exceeded, the next turn does not start
Step 2 is the problem. A single turn runs to completion before any budget check fires. If that turn generates 5,446 output tokens over 126 seconds, the budget flag can only watch.
Here is the actual JSON response from that test:
{ "type": "result", "subtype": "error_max_budget_usd", "is_error": false, "total_cost_usd": 0.1521415, "modelUsage": { "claude-opus-4-6": { "outputTokens": 5446, "cacheReadInputTokens": 14253, "costUSD": 0.1521415 } }}Notice is_error is false. This is not a crash. It is a controlled stop that happened after the money was already spent.
The Minimum Floor Makes It Worse
Every Claude CLI call has a minimum cost from the system prompt alone. For Opus, that floor is approximately $0.016 per call — about 14,253 cached tokens that must be read before Claude even sees your prompt.
This means setting --max-budget-usd 0.01 will always trigger a budget error with Opus. The system prompt tax exceeds your budget before a single output token is generated.
At micro-budgets, the overshoot ratios are dramatic:
| Budget | Actual Cost | Overshoot |
|---|---|---|
| $0.001 | $0.048 | 48x (short prompt) |
| $0.001 | $0.152 | 152x (long prompt) |
| $0.01 | $0.017 | 1.7x |
| $0.10 | $0.016 | 0.16x (under budget) |
The pattern is clear: below $0.02 with Opus, the budget flag is just a “run once and stop” mechanism.
The Fix
Set realistic minimums. For Opus single-turn tasks, use at least $0.10. For multi-turn sessions, use $5.00 or more.
Detect overshoot in automation. Check the exit code and subtype field:
RESULT=$(claude -p "Your prompt" --output-format json --max-budget-usd 1.00)SUBTYPE=$(echo "$RESULT" | jq -r '.subtype')COST=$(echo "$RESULT" | jq '.total_cost_usd')
if [ "$SUBTYPE" = "error_max_budget_usd" ]; then echo "Budget exceeded: spent $COST"fiTrack cumulative costs externally. The budget flag tracks per-session spend, but for CI/CD pipelines running hundreds of calls, you need a wrapper that accumulates total_cost_usd across invocations and kills the pipeline when a threshold is hit.
Use Sonnet for cheap tasks. Sonnet’s minimum per-call cost is $0.005 — roughly 3x cheaper than Opus. For formatting, summarization, and simple questions, --model claude-sonnet-4-6 is the right call.
The Mental Model
Think of --max-budget-usd as a governor on a car, not a brick wall. It limits how many laps you take, but it cannot stop the car mid-lap. Each lap (turn) runs to completion, and the governor only checks between laps.
For hard real-time budget enforcement, you need external tooling. The CLI’s budget flag is a useful guardrail, not a billing guarantee.