The context limit was never a tech problem. It was a pricing problem
The real story behind Claude's million-token context window isn't the number — it's the price When Anthropic quietly removed the long-context premium on Opus 4.6 last week, the change was easy to miss.

image from Gemini Imagen 4
When Anthropic quietly removed the long-context premium on Opus 4.6 last week, the change was easy to miss. The company had charged extra for prompts longer than 200,000 tokens since the feature launched. On March 13, 2026, that surcharge disappeared. A million-token request now costs the same per token as a nine-token one: five dollars per million input tokens, no multiplier.
Garry Tan, the Y Combinator partner who has watched thousands of early-stage companies build on AI infrastructure, put it plainly on X: "I underestimated how powerful Opus 4.6 with 1M tokens is. Even last year we were absolutely hitting context limit problems." The comment was not a performance benchmark. It was an economic observation from someone who has seen what happens when a capability becomes cheap enough to use continuously.
What the context limit actually meant in practice
A context window is how much text a model can see in a single prompt. The difference between 200,000 tokens — roughly 150,000 words — and a million tokens is not just a bigger buffer. It is the difference between being able to reason across one large codebase and being able to hold an entire engineering organization's worth of code, documentation, and historical decisions in a single session.
Before last week, using that full context came at a price premium that most production systems could not justify. Running a million tokens through a model at the old surcharge rates could cost hundreds of dollars per request — fine for occasional analysis, prohibitive for anything that runs continuously. The result was that developers built around the constraint: splitting work into chunks, summarizing between them, losing track of details, writing code to compensate for what the model had forgotten. Anthropic calls this compaction. Developers call it a nightmare.
The pricing change does not make the context window bigger. It makes using the full context economically rational in production systems where it was previously a luxury.
What YC companies were running into
Context limits bite hardest in agentic workflows — the kind of systems where an AI model runs autonomously for extended periods, calling tools, reading files, and making decisions across a large problem space. YC startups have been among the most aggressive early adopters of Claude Code and similar tools. When those companies started hitting the ceiling, it was not because their prompts were unusually long. It was because the work itself is long: reviewing an entire pull request diff, running a full codebase audit, analyzing months of production logs.
Sekhsaria, a founding engineer quoted in Anthropic's own release materials, described the problem precisely: large diffs did not fit in a 200K context window, so agents had to chunk context — losing cross-file dependencies in the process. With a million tokens, the full diff feeds in and the agent works from a single picture.
The shift from chunked to full-context reasoning is where the productivity delta lives. Fewer passes, fewer summary steps, fewer places for context to degrade.
The retrieval problem — and how Opus 4.6 solves it
A large context window is only as valuable as the model's ability to actually use what it holds. Long context has a well-documented failure mode called context rot: as the amount of text in a prompt grows, the model's ability to retrieve and reason across the right details degrades. On a standard needle-in-a-haystack test at a million tokens, Sonnet 4.5 retrieved information correctly about 18.5% of the time. Opus 4.6 scores 76%.
That gap is the difference between a model that can technically accept a million tokens and one that can actually find what matters inside them. Opus 4.6 scores 78.3% on MRCR v2, the highest retrieval accuracy at that context length among frontier models — according to Anthropic's own benchmarks.
What changes now that it costs the same
The use cases that become economically viable at standard pricing are exactly the ones that define production AI systems: running a full codebase audit in one pass, analyzing a year's worth of production incidents without chunking, reviewing an entire contract portfolio in a single session, synthesizing hundreds of research papers.
Anthropic's own case studies from the launch describe the shift in concrete terms. Datadog users saw compaction events drop 15% after the 1M context became standard. Law firms are running full depositions — hundreds of pages — through single sessions. Research organizations are feeding the model entire literature bases.
The pricing change also affects the agent teams feature Anthropic released alongside Opus 4.6. Running multiple AI agents in coordination — one researching, one writing, one fact-checking — across large documents is economically practical now at scale. Previously, coordinating that work across multiple expensive long-context calls would have compounded the cost problem.
The catch
There is one. The million-token context with standard pricing applies to the API and to Claude Code for Max, Team, and Enterprise users. Pro users need to opt in by typing a command. The free tier does not have access.
For individual developers and hobbyists, the unlock does not automatically arrive. For enterprises running production workloads, it arrived last week with no action required.
What Tan's observation points to is the moment when a capability that was technically available becomes financially obvious to use. The context window was there. The price was the barrier. Now it is not.

