Skip to content
context 4 min
context cost update

1M Context Is Now Standard — Here's What Changed for CLI Users

Opus 4.6 and Sonnet 4.6 now include the full 1M token context window at standard pricing. No beta header, no surcharge. Here's what that means if you're running Claude in scripts and pipelines.

Tuna Ozmen · · 4 min read

What Happened

On March 13, 2026, Anthropic made the 1M token context window generally available for Opus 4.6 and Sonnet 4.6. The key change: no pricing premium. A 900K-token request costs the same per-token rate as a 9K one.

This matters if you use the CLI. Before this, anything over 200K tokens hit a multiplier. Now it doesn’t.

The Old Setup

Before GA, the 1M context was a beta feature with restrictions:

  • Who could use it: Only organizations in usage tier 4 or with custom rate limits
  • How to enable it: You had to pass a context-1m-2025-08-07 beta header in your API requests
  • What it cost: Requests exceeding 200K input tokens were charged at premium rates — roughly 2x on input and 1.5x on output
  • The 200K threshold: If your total input tokens (including cache reads and writes) crossed 200K, the entire request was billed at premium rates, not just the tokens above the threshold

So a 250K-token request didn’t cost “200K at standard + 50K at premium.” All 250K tokens got the premium rate. That made it expensive to even get close to the boundary.

What’s Different Now

With Opus 4.6 and Sonnet 4.6, the full 1M window is standard:

What changedBefore (beta)Now (GA)
Pricing over 200K2x input, 1.5x outputStandard rates
Beta header requiredYesNo (ignored if sent)
AvailabilityTier 4+ onlyAll tiers
Media limits100 images/PDF pages600 images/PDF pages

The pricing is flat across the window:

  • Opus 4.6: $5 input / $25 output per million tokens
  • Sonnet 4.6: $3 input / $15 output per million tokens

No multiplier. A 500K-token Opus request that would have cost $5.00 in input tokens under beta pricing now costs $2.50.

What This Means for CLI Users

If you run claude -p in scripts, CI, or cron jobs, three things change:

1. Sessions last longer before compaction

Auto-compaction triggers when context fills up. With a 1M window instead of 200K, your sessions can go much longer before Claude starts summarizing earlier turns. For plan-review-execute workflows or multi-step migrations, you’re less likely to lose critical context mid-task.

That said — filling 1M tokens is expensive. A session that uses 500K tokens of context at Opus rates costs about $2.50 in input alone, per turn. The window is bigger, but you’re still paying for every token in it.

2. Large codebases fit in a single session

A medium codebase (50-100 files, ~200K tokens of source) used to consume your entire context window. Now it’s 20% of the available space. Claude can hold the full codebase in context and still have room for conversation history, tool results, and reasoning.

For CLI automation that reads multiple files before making decisions — security audits, dependency analysis, cross-module refactoring — this is the biggest practical improvement.

3. MCP tool overhead matters less

Each MCP tool description consumes 200-500 tokens. With 20+ servers, that’s 10-50K tokens of overhead. Against a 200K window, 50K tokens of tool descriptions consumed 25% of your context. Against 1M, it’s 5%. Still worth trimming, but no longer a crisis.

What Didn’t Change

  • --max-budget-usd still checks between turns. A single turn can overshoot your budget by any amount. The 1M window doesn’t fix this — it just means the potential overshoot on a single turn is larger since Claude has more room to generate.

  • Compaction still happens. The threshold is higher, but eventually you’ll hit it. Keep critical instructions in CLAUDE.md so they survive compaction.

  • Cost per token is the same. More context = more tokens = higher cost per turn. The per-token rate didn’t change, only the penalty for going over 200K was removed.

The Pricing Shift for Opus

Separately from the 1M GA, Opus 4.6 got a price cut. The previous generation (Opus 4.1/4) was $15/$75 per million tokens. Opus 4.6 is $5/$25.

That’s a 3x reduction. Combined with the removal of the long-context premium, a 500K-token Opus session that cost ~$7.50 under the old model now costs ~$2.50. Same quality, same context, a third of the price.

The Sonnet vs Opus gap also narrowed. Sonnet used to be 5x cheaper than Opus ($3/$15 vs $15/$75). Now it’s about 40% cheaper ($3/$15 vs $5/$25). Sonnet is still the right pick for simple tasks, but the cost argument for model routing is less dramatic than it was.

Bottom Line

If you were avoiding long-context requests because of the pricing premium, that constraint is gone. If you were hitting compaction walls at 200K, you now have 5x the headroom.

Update your scripts: any hardcoded 200000 context limits should be 1000000. Any cost estimates based on the old Opus pricing ($15/$75) are about 3x too high.

For the full context management guide — compaction strategies, cache economics, effort levels — see Context Management.

Read the full context management guide

Read the full guide
Found this useful? Share it with your team.
Share