Why --max-budget-usd Is Not a Hard Cap

The Setup

Set a tight budget on an expensive prompt:

claude -p "Write a 5000 word essay about the history of computing" \
  --output-format json \
  --max-budget-usd 0.05

Two minutes later, the full essay came back. The bill: $0.152 — 3x the budget. Claude finished the entire generation before the budget check fired.

Why This Happens

The --max-budget-usd flag is not a hard cap. It is a between-turn check. Here is the actual enforcement sequence:

Claude starts generating a response
The full response completes (one turn)
Then the budget is checked
If exceeded, the next turn does not start

Step 2 is the problem. A single turn runs to completion before any budget check fires. If that turn generates 5,446 output tokens over 126 seconds, the budget flag can only watch.

Here is the JSON response:

{
  "type": "result",
  "subtype": "error_max_budget_usd",
  "is_error": false,
  "total_cost_usd": 0.1521415,
  "modelUsage": {
    "claude-opus-4-6": {
      "outputTokens": 5446,
      "cacheReadInputTokens": 14253,
      "costUSD": 0.1521415
    }
  }
}

Notice is_error is false. This is not a crash. It is a controlled stop that happened after the money was already spent.

The Minimum Floor

Every Claude CLI call has a minimum cost from the system prompt alone. For Opus, that floor is approximately $0.016 per call — about 14,253 cached tokens read before Claude even sees your prompt.

Any budget below this floor is guaranteed to be exceeded:

Budget	Actual Cost	What Happens
$0.01	$0.017	System prompt floor exceeds budget
$0.05	$0.016 - $0.15	Works for short answers, exceeds on long ones
$0.50	$0.02 - $0.48	Reliable for single-turn tasks
$2.00+	Near budget	Budget controls work as expected

Below $0.02 with Opus, the budget flag is just a “run once and stop” mechanism.

The Fix

Set realistic minimums. For Opus single-turn tasks, use at least $0.10. For multi-turn sessions, $1.00 or more.

Detect overshoot in automation. Check the exit code and subtype field:

RESULT=$(claude -p "Your prompt" --output-format json --max-budget-usd 1.00)
SUBTYPE=$(echo "$RESULT" | jq -r '.subtype')
COST=$(echo "$RESULT" | jq '.total_cost_usd')

if [ "$SUBTYPE" = "error_max_budget_usd" ]; then
  echo "Budget exceeded: spent $COST"
fi

Track cumulative costs externally. The budget flag tracks per-session spend, but for CI/CD pipelines running hundreds of calls, you need a wrapper that accumulates total_cost_usd across invocations and kills the pipeline when a threshold is hit.

Use Sonnet for cheap tasks. Sonnet’s minimum per-call cost is $0.005 — roughly 3x cheaper than Opus. For formatting, summarization, and simple questions, --model claude-sonnet-4-6 is the right call.

The Mental Model

Think of --max-budget-usd as a governor on a car, not a brick wall. It limits how many laps you take, but it cannot stop the car mid-lap. Each lap (turn) runs to completion, and the governor only checks between laps.

For hard real-time budget enforcement, you need external tooling. The CLI’s budget flag is a useful guardrail, not a billing guarantee.

The Setup

Why This Happens

The Minimum Floor

The Fix

The Mental Model

Master Claude CLI cost controls