Skip to content

Cost at Scale

Budgets, benchmarks, and optimization for production workloads

9 min read

One developer’s CLI usage costs $6/day. Totally fine. Multiply by 50 developers running unattended agents overnight, and your monthly API bill becomes a line item that needs a budget owner and a hard cap.

When a single developer uses Claude CLI, costs are intuitive — you see the bill and adjust. When ten developers run thousands of API calls per day, costs become a system design problem. This chapter covers the budgeting, model selection, and optimization strategies that keep production workloads predictable and affordable.

See What It Costs for per-call pricing and cache economics.

Startup Token Overhead

Every claude session consumes 30,000-40,000 tokens before you type a single word. This “startup tax” includes:

  • System prompt: ~14,000 tokens (tool definitions, safety rules, output formatting)
  • CLAUDE.md files: 1,000-10,000+ tokens (all files in the hierarchy are loaded)
  • MCP tool descriptions: 0-50,000+ tokens (each tool = 200-500 tokens; 20 servers can add 20K+)

At Opus pricing, this means $0.02-0.06 per session just to start. For Pro plan users, startup alone burns 1-3% of your daily quota. This cost is invisible — it does not appear in the result field, only in the usage.cache_read_input_tokens and usage.cache_creation_input_tokens breakdown.

Mitigation strategies:

  • Disable unused MCP servers — each removed server saves 200-500 tokens per call
  • Use --tools "" for pure text tasks — strips tool definitions from the system prompt
  • Keep CLAUDE.md under 10,000 words — split monorepo configs into subdirectory files
  • Batch related calls to maximize cache hits on the startup overhead
Startup Burns 1-3% of Pro Quota

A single session startup on Pro plan consumes 1-3% of your daily quota before you ask anything. With 30-40K tokens loaded (system prompt + tools + CLAUDE.md), the invisible startup cost is roughly equivalent to a short conversation. Disable unused MCP servers and keep CLAUDE.md lean to minimize this overhead.

Team Cost Benchmarks

Before setting organizational budgets, you need a baseline. Individual costs vary significantly based on model choice, task complexity, caching behavior, and MCP server configuration.

Team Cost Benchmarks

MetricValuePlanning Note
Average per developer/day$6Median across teams
90th percentile daily$12Set alerts at this threshold (90th percentile from same data)
Monthly estimate per developer$100-20022 working days at $5-9/day
10-person team monthly$1,000-2,000Budget for $2,500 to cover spikes
50-person team monthly$5,000-10,000Negotiate volume pricing at this tier

These numbers assume a mix of Opus and Sonnet usage with caching enabled. Without caching, multiply by 3-5x. A 100-turn Opus session costs $50-100 uncached versus $10-19 cached — that difference compounds fast across a team.

One 100-dev team reported a $1,200 spike in a single week — three developers running aggressive session loops without budget caps. The fix was simple: mandatory --max-budget-usd in wrapper scripts and per-dev daily alerts at $15.

Tip

Track per-developer spend with —output-format json and aggregate total_cost_usd across calls. Tools like ccusage can automate this across your organization.

Try This

Measure your actual per-task cost. Run the same prompt 3 times in one session and track costs:

for i in 1 2 3; do claude -p “Summarize this repo” —output-format json | jq ‘{run: ‘$i’, cost: .total_cost_usd, cached: .usage.cache_read_input_tokens}’; done

Watch how cache_read_input_tokens increases on subsequent calls. How much cheaper is run 3 compared to run 1?

Optimization Priorities

This chapter covers team benchmarks and the cost baseline. For actionable optimization strategies, see the deep-dive chapters:

  • Optimization Strategies — model routing (Sonnet vs Opus), effort levels, two-pass strategies, prompt caching, batch processing, session management, and cache TTL awareness

The short version: use Sonnet for anything that doesn’t need deep reasoning. Code review, formatting, summaries — Sonnet handles these at 4-8x lower cost with no quality drop.

Priority order (ranked by typical savings):

  1. Sonnet for simple tasks (70% cost reduction vs Opus for lint/format/summarize)
  2. Budget caps (--max-budget-usd prevents runaway loops)
  3. Cache reuse (stable system prompts, session resume)
  4. Compaction (trim old turns, smaller CLAUDE.md)
  5. Haiku experiments (10x cheaper for trivial tasks, works for <20% of workloads)
See This in Action

See real per-review cost analytics ($0.04-0.08 avg) with SQL dashboards tracking cost_usd, turns, and cache_read_tokens across 500+ reviews in Build an MR Reviewer, Part 6: Handle Errors and Track Costs.

Now Do This

Add cost tracking to one automation script: COST=$(claude -p ”…” —output-format json | jq ‘.total_cost_usd’) and log it to a file. After a week, you’ll know your actual per-task costs — the first step to budgeting at scale.

Budget Controls

A developer’s overnight agent runs 200 API calls while they sleep. There’s no org-wide budget cap, so the bill hits $400 before anyone notices. Budget controls aren’t about individual calls — they’re about organizational policy enforcement at scale.

When a single developer uses Claude CLI, costs are intuitive — you see the bill and adjust. When ten developers run thousands of API calls per day, costs become a system design problem. This section covers the budgeting controls and organizational policies that keep production workloads predictable.

Per-Call Budgets

The --max-budget-usd flag sets a spending limit on a single invocation:

Terminal window
# Cap a code review at $2
claude -p "Review this PR for issues" --output-format json --max-budget-usd 2.00
# Cap a quick lookup at $0.10
claude -p "What does this function do?" --output-format json --max-budget-usd 0.10

In scripts that run unattended, always set a budget. Without one, a multi-turn agent loop can burn through dollars before anyone notices.

Gotcha

The budget is checked between turns, not mid-generation. A single turn can exceed the budget by 100x or more. Set budgets at least 10x your expected per-turn cost. A $0.10 minimum is practical for Opus; anything below $0.02 will always trigger a budget error from the system prompt alone.

Budget vs. Minimum Floor

Every Opus call has a minimum cost of ~$0.02-0.05 (system prompt + one turn). Budgets set below this floor are meaningless — the first turn always completes regardless.

Budget Behavior at Different Levels

BudgetActual CostWhat Happens
Below $0.02$0.02 - $0.05Always exceeds — system prompt floor is higher than the budget
$0.05 - $0.10~$0.02 - $0.05Works for single-turn tasks, tight for multi-turn
$0.50+Near budgetBudget controls work as expected

The practical minimum is $0.10 for single-turn Opus tasks and $0.50+ for multi-turn sessions. Below $0.02, you are not testing budget enforcement — you are testing the system prompt floor.

Try This

Test budget enforcement on a single call:

claude -p “Write a 1000-word essay about testing” —max-budget-usd 0.50 —output-format json | jq ‘{subtype, cost: .total_cost_usd}’

Did the call complete within budget? Now try —max-budget-usd 0.01 to see the floor in action.

Per-Session Budgets

When using --resume to continue a conversation, the budget tracks cumulative cost across all calls in that session:

Terminal window
# First call — uses part of the $5 budget
claude -p "Analyze the codebase" --output-format json --max-budget-usd 5.00
# Resumed call — shares the same $5 budget
claude -p "Now refactor the auth module" --resume $SESSION_ID --max-budget-usd 5.00

A $5 budget shared across 5 resumed calls averages about $1 per call. Plan your per-session budget around the total work you expect, not individual turns.

Organizational Limits

For teams, per-call budgets are not enough. You need a wrapper that enforces organizational policy:

#!/bin/bash
# team-claude.sh — wrapper that enforces org-wide daily limits
DAILY_LIMIT=15.00
DAILY_LOG="/tmp/claude-cost-$(whoami)-$(date +%Y%m%d).log"
# Sum today's spend
SPENT=$(awk '{sum += $1} END {printf "%.4f", sum}' "$DAILY_LOG" 2>/dev/null || echo "0")
REMAINING=$(echo "$DAILY_LIMIT - $SPENT" | bc)
if (( $(echo "$REMAINING <= 0" | bc -l) )); then
echo "Daily budget exhausted ($DAILY_LIMIT). Resets tomorrow." >&2
exit 1
fi
# Run with remaining budget as cap
RESULT=$(claude -p "$@" --output-format json --max-budget-usd "$REMAINING")
COST=$(echo "$RESULT" | jq -r '.total_cost_usd // 0')
echo "$COST" >> "$DAILY_LOG"
echo "$RESULT"

This pattern gives you a daily per-developer cap while still letting individual calls use --max-budget-usd for finer control underneath.

Cost Monitoring

Aggregating Costs Across Calls

Every JSON response includes total_cost_usd. Aggregate this across pipeline steps for real-time cost tracking:

#!/bin/bash
# ci-cost-tracker.sh — aggregate costs across a CI pipeline
COST_LOG="/tmp/ci-costs-$BUILD_ID.log"
run_claude_tracked() {
local label="$1"; shift
RESULT=$(claude -p "$@" --output-format json)
COST=$(echo "$RESULT" | jq -r '.total_cost_usd // 0')
echo "$label: $COST" >> "$COST_LOG"
echo "$RESULT"
}
# Run multiple steps
run_claude_tracked "lint-review" "Check for lint violations" --max-budget-usd 0.50
run_claude_tracked "security-review" "Review for security issues" --max-budget-usd 1.00
run_claude_tracked "doc-gen" "Generate API docs" --max-budget-usd 0.50
# Report total
TOTAL=$(awk -F': ' '{sum += $2} END {printf "%.4f", sum}' "$COST_LOG")
echo "Total pipeline cost: \$$TOTAL"

Budget Alert Thresholds

Set up tiered alerts based on the team cost benchmarks:

Budget Alert Thresholds

ThresholdActionWho Gets Notified
Developer exceeds $12/daySlack notificationDeveloper + team lead
Team exceeds 80% of monthly budgetWarning emailTeam lead + finance
CI pipeline exceeds $5/runPipeline annotationPR author
Single call exceeds $2Audit log entryAutomated review

Budget controls limit spending. To also reduce spending, see Optimization Strategies for routing tasks to cheaper models and maximizing cache hits.

Now Do This

Set —max-budget-usd 0.50 on your next automation run. Check the subtype field in the response — did it complete as “success” or hit “error_max_budget_usd”? This is the simplest budget control: a per-call cap that prevents runaway spending.