OpenAI is done competing with Anthropic on who has the best model. It is competing on who can build the platform that everyone else wraps around.
The April 2026 update to the Agents SDK is the clearest signal yet of which game OpenAI is playing. The headline feature is not a new model — it is durable execution, a checkpoint-and-recover system that lets agent pipelines survive infrastructure failures without restarting from scratch. It is also the most opinionated infrastructure bet OpenAI has made: that the bottleneck for AI agents in production is not capability, it is reliability, and the company that solves reliability owns the platform layer. Ramp, the expense management company, has already made that bet. It runs coding agents on Modal responsible for more than half its PRs — a deployment that survived the infrastructure underneath it, not a demo that died with the container.
The technical foundation is real. With built-in snapshotting and rehydration, the SDK can restore an agent's working state in a fresh container if the original environment fails or expires. The agent picks up where it left off. The context is not regenerated. The work is not lost. The mechanism works through seven supported sandbox providers — Modal, Vercel, Cloudflare, E2B, Blaxel, Daytona, and Runloop — and is paired with a Responses API that unifies what previously lived across the Assistants API and Chat Completions API. The older Assistants API is deprecated mid-2026. Standard API pricing applies.
The provider-agnostic design is a deliberate break from what the SDK was before April. Earlier versions locked developers into OpenAI models only. The changelog now lists support for more than a hundred LLMs from other providers — a README assertion that has not been independently benchmarked for equivalence under the SDK's abstraction layer. Python is available now; TypeScript support is planned for a later release. The codebase has 20,700-plus GitHub stars and over 4,900 projects depending on it.
The counterforce is the same as it is for any infrastructure claim without a production benchmark behind it. The durable execution feature has no independent measurement of recovery latency — how long it actually takes to rehydrate an agent state versus starting fresh — across the full set of sandbox providers. The Ramp example is one company's implementation on Modal, which is also a sandbox provider. Whether the guarantees hold at scale, under load, on the other six providers, is still an open question. The TypeScript release will matter for the ecosystem that has been waiting.
What changes if it holds: the agent framework conversation shifts from capability to reliability. Every framework publishes task success rates. None of them had a built-in answer to the question every developer asks when their container dies at 2 a.m. OpenAI is betting that answer is worth building into the SDK, and that being the company that builds it means being the platform.
What to watch: whether any sandbox provider publishes recovery latency data — the number that separates production-grade durability from production-adjacent claims.
Sources: OpenAI blog | Modal Blog | TechCrunch | GitHub | OpenLinks infographic