1M Context Is Now Standard — Here's What Changed for CLI Users

What Happened

On March 13, 2026, Anthropic made the 1M token context window generally available for Opus 4.6 and Sonnet 4.6. The key change: no pricing premium. A 900K-token request costs the same per-token rate as a 9K one.

This matters if you use the CLI. Before this, anything over 200K tokens hit a multiplier. Now it doesn’t.

The Old Setup

Before GA, the 1M context was a beta feature with restrictions:

Who could use it: Only organizations in usage tier 4 or with custom rate limits
How to enable it: You had to pass a context-1m-2025-08-07 beta header in your API requests
What it cost: Requests exceeding 200K input tokens were charged at premium rates — roughly 2x on input and 1.5x on output
The 200K threshold: If your total input tokens (including cache reads and writes) crossed 200K, the entire request was billed at premium rates, not just the tokens above the threshold

So a 250K-token request didn’t cost “200K at standard + 50K at premium.” All 250K tokens got the premium rate. That made it expensive to even get close to the boundary.

What’s Different Now

With Opus 4.6 and Sonnet 4.6, the full 1M window is standard:

What changed	Before (beta)	Now (GA)
Pricing over 200K	2x input, 1.5x output	Standard rates
Beta header required	Yes	No (ignored if sent)
Availability	Tier 4+ only	All tiers
Media limits	100 images/PDF pages	600 images/PDF pages

The pricing is flat across the window:

Opus 4.6: $5 input / $25 output per million tokens
Sonnet 4.6: $3 input / $15 output per million tokens

No multiplier. A 500K-token Opus request that would have cost $5.00 in input tokens under beta pricing now costs $2.50.

What This Means for CLI Users

If you run claude -p in scripts, CI, or cron jobs, three things change:

1. Sessions last longer before compaction

Auto-compaction triggers when context fills up. With a 1M window instead of 200K, your sessions can go much longer before Claude starts summarizing earlier turns. For plan-review-execute workflows or multi-step migrations, you’re less likely to lose critical context mid-task.

That said — filling 1M tokens is expensive. A session that uses 500K tokens of context at Opus rates costs about $2.50 in input alone, per turn. The window is bigger, but you’re still paying for every token in it.

2. Large codebases fit in a single session

A medium codebase (50-100 files, ~200K tokens of source) used to consume your entire context window. Now it’s 20% of the available space. Claude can hold the full codebase in context and still have room for conversation history, tool results, and reasoning.

For CLI automation that reads multiple files before making decisions — security audits, dependency analysis, cross-module refactoring — this is the biggest practical improvement.

3. MCP tool overhead matters less

Each MCP tool description consumes 200-500 tokens. With 20+ servers, that’s 10-50K tokens of overhead. Against a 200K window, 50K tokens of tool descriptions consumed 25% of your context. Against 1M, it’s 5%. Still worth trimming, but no longer a crisis.

What Didn’t Change

--max-budget-usd still checks between turns. A single turn can overshoot your budget by any amount. The 1M window doesn’t fix this — it just means the potential overshoot on a single turn is larger since Claude has more room to generate.
Compaction still happens. The threshold is higher, but eventually you’ll hit it. Keep critical instructions in CLAUDE.md so they survive compaction.
Cost per token is the same. More context = more tokens = higher cost per turn. The per-token rate didn’t change, only the penalty for going over 200K was removed.

The Pricing Shift for Opus

Separately from the 1M GA, Opus 4.6 got a price cut. The previous generation (Opus 4.1/4) was $15/$75 per million tokens. Opus 4.6 is $5/$25.

That’s a 3x reduction. Combined with the removal of the long-context premium, a 500K-token Opus session that cost ~$7.50 under the old model now costs ~$2.50. Same quality, same context, a third of the price.

The Sonnet vs Opus gap also narrowed. Sonnet used to be 5x cheaper than Opus ($3/$15 vs $15/$75). Now it’s about 40% cheaper ($3/$15 vs $5/$25). Sonnet is still the right pick for simple tasks, but the cost argument for model routing is less dramatic than it was.

Bottom Line

If you were avoiding long-context requests because of the pricing premium, that constraint is gone. If you were hitting compaction walls at 200K, you now have 5x the headroom.

Update your scripts: any hardcoded 200000 context limits should be 1000000. Any cost estimates based on the old Opus pricing ($15/$75) are about 3x too high.

For the full context management guide — compaction strategies, cache economics, effort levels — see Context Management.