Skip to content
cost 3 min
cost gotcha cli

Why --max-budget-usd Is Not a Hard Cap

The --max-budget-usd flag is checked between turns, not mid-generation. A single expensive turn runs to completion before any limit kicks in. Here's how it actually works.

Tuna Ozmen · · 3 min read

The Setup

Set a tight budget on an expensive prompt:

Terminal window
claude -p "Write a 5000 word essay about the history of computing" \
--output-format json \
--max-budget-usd 0.05

Two minutes later, the full essay came back. The bill: $0.152 — 3x the budget. Claude finished the entire generation before the budget check fired.

Why This Happens

The --max-budget-usd flag is not a hard cap. It is a between-turn check. Here is the actual enforcement sequence:

  1. Claude starts generating a response
  2. The full response completes (one turn)
  3. Then the budget is checked
  4. If exceeded, the next turn does not start

Step 2 is the problem. A single turn runs to completion before any budget check fires. If that turn generates 5,446 output tokens over 126 seconds, the budget flag can only watch.

Here is the JSON response:

{
"type": "result",
"subtype": "error_max_budget_usd",
"is_error": false,
"total_cost_usd": 0.1521415,
"modelUsage": {
"claude-opus-4-6": {
"outputTokens": 5446,
"cacheReadInputTokens": 14253,
"costUSD": 0.1521415
}
}
}

Notice is_error is false. This is not a crash. It is a controlled stop that happened after the money was already spent.

The Minimum Floor

Every Claude CLI call has a minimum cost from the system prompt alone. For Opus, that floor is approximately $0.016 per call — about 14,253 cached tokens read before Claude even sees your prompt.

Any budget below this floor is guaranteed to be exceeded:

BudgetActual CostWhat Happens
$0.01$0.017System prompt floor exceeds budget
$0.05$0.016 - $0.15Works for short answers, exceeds on long ones
$0.50$0.02 - $0.48Reliable for single-turn tasks
$2.00+Near budgetBudget controls work as expected

Below $0.02 with Opus, the budget flag is just a “run once and stop” mechanism.

The Fix

Set realistic minimums. For Opus single-turn tasks, use at least $0.10. For multi-turn sessions, $1.00 or more.

Detect overshoot in automation. Check the exit code and subtype field:

Terminal window
RESULT=$(claude -p "Your prompt" --output-format json --max-budget-usd 1.00)
SUBTYPE=$(echo "$RESULT" | jq -r '.subtype')
COST=$(echo "$RESULT" | jq '.total_cost_usd')
if [ "$SUBTYPE" = "error_max_budget_usd" ]; then
echo "Budget exceeded: spent $COST"
fi

Track cumulative costs externally. The budget flag tracks per-session spend, but for CI/CD pipelines running hundreds of calls, you need a wrapper that accumulates total_cost_usd across invocations and kills the pipeline when a threshold is hit.

Use Sonnet for cheap tasks. Sonnet’s minimum per-call cost is $0.005 — roughly 3x cheaper than Opus. For formatting, summarization, and simple questions, --model claude-sonnet-4-6 is the right call.

The Mental Model

Think of --max-budget-usd as a governor on a car, not a brick wall. It limits how many laps you take, but it cannot stop the car mid-lap. Each lap (turn) runs to completion, and the governor only checks between laps.

For hard real-time budget enforcement, you need external tooling. The CLI’s budget flag is a useful guardrail, not a billing guarantee.

Master Claude CLI cost controls

Read the full guide
Found this useful? Share it with your team.
Share