What Happened
On March 13, 2026, Anthropic made the 1M token context window generally available for Opus 4.6 and Sonnet 4.6. The key change: no pricing premium. A 900K-token request costs the same per-token rate as a 9K one.
This matters if you use the CLI. Before this, anything over 200K tokens hit a multiplier. Now it doesn’t.
The Old Setup
Before GA, the 1M context was a beta feature with restrictions:
- Who could use it: Only organizations in usage tier 4 or with custom rate limits
- How to enable it: You had to pass a
context-1m-2025-08-07beta header in your API requests - What it cost: Requests exceeding 200K input tokens were charged at premium rates — roughly 2x on input and 1.5x on output
- The 200K threshold: If your total input tokens (including cache reads and writes) crossed 200K, the entire request was billed at premium rates, not just the tokens above the threshold
So a 250K-token request didn’t cost “200K at standard + 50K at premium.” All 250K tokens got the premium rate. That made it expensive to even get close to the boundary.
What’s Different Now
With Opus 4.6 and Sonnet 4.6, the full 1M window is standard:
| What changed | Before (beta) | Now (GA) |
|---|---|---|
| Pricing over 200K | 2x input, 1.5x output | Standard rates |
| Beta header required | Yes | No (ignored if sent) |
| Availability | Tier 4+ only | All tiers |
| Media limits | 100 images/PDF pages | 600 images/PDF pages |
The pricing is flat across the window:
- Opus 4.6: $5 input / $25 output per million tokens
- Sonnet 4.6: $3 input / $15 output per million tokens
No multiplier. A 500K-token Opus request that would have cost $5.00 in input tokens under beta pricing now costs $2.50.
What This Means for CLI Users
If you run claude -p in scripts, CI, or cron jobs, three things change:
1. Sessions last longer before compaction
Auto-compaction triggers when context fills up. With a 1M window instead of 200K, your sessions can go much longer before Claude starts summarizing earlier turns. For plan-review-execute workflows or multi-step migrations, you’re less likely to lose critical context mid-task.
That said — filling 1M tokens is expensive. A session that uses 500K tokens of context at Opus rates costs about $2.50 in input alone, per turn. The window is bigger, but you’re still paying for every token in it.
2. Large codebases fit in a single session
A medium codebase (50-100 files, ~200K tokens of source) used to consume your entire context window. Now it’s 20% of the available space. Claude can hold the full codebase in context and still have room for conversation history, tool results, and reasoning.
For CLI automation that reads multiple files before making decisions — security audits, dependency analysis, cross-module refactoring — this is the biggest practical improvement.
3. MCP tool overhead matters less
Each MCP tool description consumes 200-500 tokens. With 20+ servers, that’s 10-50K tokens of overhead. Against a 200K window, 50K tokens of tool descriptions consumed 25% of your context. Against 1M, it’s 5%. Still worth trimming, but no longer a crisis.
What Didn’t Change
-
--max-budget-usdstill checks between turns. A single turn can overshoot your budget by any amount. The 1M window doesn’t fix this — it just means the potential overshoot on a single turn is larger since Claude has more room to generate. -
Compaction still happens. The threshold is higher, but eventually you’ll hit it. Keep critical instructions in CLAUDE.md so they survive compaction.
-
Cost per token is the same. More context = more tokens = higher cost per turn. The per-token rate didn’t change, only the penalty for going over 200K was removed.
The Pricing Shift for Opus
Separately from the 1M GA, Opus 4.6 got a price cut. The previous generation (Opus 4.1/4) was $15/$75 per million tokens. Opus 4.6 is $5/$25.
That’s a 3x reduction. Combined with the removal of the long-context premium, a 500K-token Opus session that cost ~$7.50 under the old model now costs ~$2.50. Same quality, same context, a third of the price.
The Sonnet vs Opus gap also narrowed. Sonnet used to be 5x cheaper than Opus ($3/$15 vs $15/$75). Now it’s about 40% cheaper ($3/$15 vs $5/$25). Sonnet is still the right pick for simple tasks, but the cost argument for model routing is less dramatic than it was.
Bottom Line
If you were avoiding long-context requests because of the pricing premium, that constraint is gone. If you were hitting compaction walls at 200K, you now have 5x the headroom.
Update your scripts: any hardcoded 200000 context limits should be 1000000. Any cost estimates based on the old Opus pricing ($15/$75) are about 3x too high.
For the full context management guide — compaction strategies, cache economics, effort levels — see Context Management.