Skip to content
Agents & Orchestration Intermediate

MAX_THINKING_TOKENS Tuning

Cap extended thinking token budget for predictable CI costs

Command

$ MAX_THINKING_TOKENS=8000 "color:#7C5CFC">claude -p "Refactor auth.py" "color:#d97757">--output-format json

Response

{
  "result": "Refactored auth.py...",
  "usage": { "output_tokens": 8000 },
  "total_cost_usd": 0.65
}

Parsing Code

059669">">// Extended thinking at Opus pricing ($75/M output):
059669">">// 10K tokens = $0.75 per response just 059669">">for thinking
059669">">// 20K tokens = $1.50 per response
059669">">// Cap with MAX_THINKING_TOKENS 059669">">for predictable costs
059669">">//
059669">">// Combine: --effort high + MAX_THINKING_TOKENS=8000
// = full reasoning within a cost ceiling

Gotchas

! Extended thinking can consume 10K+ tokens per response ($0.75+ at Opus rates)
! Combine with --effort for maximum control: --effort sets quality, MAX_THINKING_TOKENS caps cost

Related Recipes