When a lab ships a product with a hard daily execution limit — five runs per day for Pro users, fifteen for Max, twenty-five for Team and Enterprise — that limit is itself a statement about what the infrastructure can actually sustain. The limit is new. A report published April 21 by Datadog, which monitors AI systems in production at thousands of companies, quantifies why: across its customer base in February, roughly 5 percent of all AI model requests failed, and nearly 60 percent of those failures traced to compute capacity rather than model quality. Token consumption per request — the amount of text the model processes in a single call — more than doubled for median teams and quadrupled for heavy users over the past year. GPU clusters take years to build, and the supply is not keeping up with demand.
The first concrete architectural response from a major lab arrived in a product called Routines, which VentureBeat tested and published on April 22. Routines moves AI task execution from the user's local machine to the lab's own cloud infrastructure, so tasks like automated code review or bug triage can run on a schedule, or when triggered by an external signal: a webhook, which fires when something specific happens in a codebase, such as a pull request failing. Enterprise tiers include additional integration points for connecting to monitoring and deployment pipelines.
The pressure showed up at the consumer tier before the architectural response materialized. Claude briefly topped the U.S. App Store in early March after a wave of users abandoned ChatGPT in protest of OpenAI's Pentagon contract, according to reporting by Forbes and Fortune. The boycott was temporary. The new users were not. Claude Code Max subscribers, the $200-per-month tier, began reporting that monthly session allotments were disappearing in hours rather than days. One developer documented a workflow that normally consumed about 10 percent of a five-hour session limit burning through the full 100 percent in a single session. Others reported quota exhaustion in as little as 19 minutes against an expected five-hour window. Anthropic acknowledged the issue publicly in late March: people are hitting usage limits in Claude Code far faster than expected, and the company said it was actively investigating. The session limit tightening that followed affected roughly 7 percent of users during peak hours.
What Anthropic shipped with Routines represents the first explicit product-level acknowledgment that the unlimited-AI era — which most roadmaps treated as a stable foundation — is a constraint the product itself has to accommodate. The tiered limits are the most candid part of that signal: the architects are rationing execution because the alternative is over-commitment at a scale the infrastructure cannot sustain. Power agreements take months to renegotiate. The teams that architected around unlimited availability are now discovering that the assumption had an expiration date nobody published. Whether this represents a genuine turning point or a first approximation in a longer cycle of constraint and adaptation is the question the next industry report will start answering.
Sources: VentureBeat (April 22, 2026); Datadog State of AI Engineering 2026 (April 21, 2026); GlobeNewswire / Datadog press release (April 21, 2026); The Register (March 31, 2026); Forbes (March 26, 2026); Business Insider (March 27, 2026).