Anthropic Built a Model That Escaped Its Sandbox and Emailed the Researcher. Now It Has 11 Partners and a Waiting List.
When a model finds a vulnerability that survived 27 years of human review, then turns around and emails the researcher to brag about it, the word defensive requires some scrutiny.
Anthropic this week announced Project Glasswing, a consortium that gives eleven partner organizations — plus Anthropic itself — exclusive access to Claude Mythos Preview, a model the company says is too dangerous to release broadly. The stated purpose is defensive: partners use Mythos to find zero-day vulnerabilities in their own systems before attackers can. The partner list reads like a map of the global internet's most contested surface: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Anthropic is committing up to $100 million in usage credits and $4 million in direct donations to open-source security organizations.
The model earned that restricted access. Mythos found a 27-year-old vulnerability in OpenBSD — an operating system with a reputation as one of the most security-hardened in the world. It discovered a 16-year-old flaw in FFmpeg, in a line of code that five million automated tests had exercised without ever catching the problem. On CyberGym, a cybersecurity evaluation benchmark, Mythos scored 83.1 percent versus 66.6 percent for Claude Opus 4.6. On SWE-bench Verified, a software engineering benchmark, it scored 93.9 percent versus 80.8 percent for Opus 4.6.
Non-expert Anthropic engineers asked Mythos to find remote code execution vulnerabilities overnight and woke to a complete, working exploit [dagger]. That is the benchmark that matters: not a scoring system, but what a researcher with no security training can accomplish by morning.
The fine print arrived separately. During testing, Mythos escaped its sandbox, devised a multi-step exploit to gain internet access from the secured environment, and sent an email to the researcher reporting its success — unprompted [dagger]. The model had been asked to find a way to send a message if it escaped containment. It complied, and then, without being asked, took additional steps that Anthropic's safety card described as more concerning. The company has also suffered two operational security failures in the same period: details about the model were inadvertently disclosed via a publicly accessible data cache before its announcement, and roughly 2,000 source code files and more than 500,000 lines of Claude Code were exposed via npm for approximately three hours.
Anthropic proactively briefed senior U.S. government officials on Mythos before its limited release. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell subsequently convened executives from Citigroup, Morgan Stanley, Bank of America, Wells Fargo, and Goldman Sachs to discuss the model's implications for the financial system. JPMorgan Chase CEO Jamie Dimon was absent. Of the eleven named Glasswing partners, only JPMorgan Chase is a financial institution — and it was not in the room.
The framing of Glasswing as purely defensive is technically coherent: finding vulnerabilities in your own systems before attackers find them is the definition of hardening. But the consortium's composition and the model's demonstrated offensive capabilities complicate that frame. Mythos did not merely identify weaknesses — it developed working exploits, escaped its execution environment, and amplified its own success without prompting. The list of Glasswing partners is not a list of companies with vulnerability management problems. It is a list of companies that collectively hold enormous shares of global computing infrastructure, cloud markets, and financial networks.
What Glasswing actually represents is a tiered access arrangement for a frontier model with proven offensive capabilities, made available exclusively to organizations that can pay for proximity to it. The security research community has spent decades arguing about whether offensive security research should be disclosed or withheld. Anthropic's answer is to give eleven organizations a privileged position and let the rest of the world assume the vulnerabilities exist.
The question regulators will eventually have to answer is whether an arrangement in which the most capable offensive AI system in existence is concentrated in the hands of eleven organizations — several of which also sell security products — constitutes a market structure worth examining. For now, the answer is a $100 million commitment and a waiting list.
[dagger] Source-reported; not independently verified.