The US banned Anthropic's newest models for a 'jailbreak' that researchers say isn't one

PREVIEWThe US banned Anthropic's newest models for a 'jailbreak' that researchers say isn't one · MD

The US government banned Anthropic's two newest AI models on June 12, 2026, citing national security concerns over a jailbreak claim that the company and a coalition of 76 cybersecurity researchers say does not exist.

The directive, delivered verbally and without written administrative record, ordered Anthropic to immediately cut off access to Claude Fable 5 and Claude Mythos 5, the company's most capable public and controlled models. Anthropic complied and disabled both systems worldwide. No other AI lab was targeted, even though researchers say the underlying capability the ban describes reproduces on OpenAI's GPT-5.5, Anthropic's own Claude Opus 4.8 and Sonnet, and Chinese models like Moonshot's Kimi 2.7.

That gap between what the government acted on and what the broader frontier model landscape actually looks like is now the story. Fable 5 had been on the market roughly three days when the ban landed; Vals AI benchmarks called it the most capable AI model publicly available at release. Mythos 5 had never been broadly released. Anthropic had been distributing it through a controlled program called Project Glasswing to roughly 50 vetted organizations, expanding to about 150 organizations across 15 countries for critical-infrastructure use.

The trigger was a paper attributed to Amazon researchers that, in Anthropic's telling, demonstrated nothing more than asking a model to read code and identify software flaws, a workflow the company characterizes as standard defensive security work. Katie Moussouris, founder of Luta Security and one of the most established vulnerability-disclosure researchers in the field, reviewed the paper and reached a similar conclusion. In a signed blog post she argued the paper did not document a real jailbreak, only a code-fix workflow, and that the same techniques replicate on the other major frontier systems. Moussouris is one of 76 cybersecurity researchers who signed an open letter on freefable.org calling the US ban dangerous.

The signatories include Alex Stamos, Casey Ellis, Jon Callas, Paul Vixie, Dino Dai Zovi, and Rachel Tobac, a roster of people who built modern vulnerability coordination and have spent careers arguing that reading code for flaws is, by itself, defensive work. Their letter is not a denial that capable systems can be misused. It is an objection to a ban whose predicate has not been published, whose administrative record is verbal, and whose scope is one lab rather than the broader class of models that share the same underlying capability.

The dispute matters because the definition of "jailbreak" is the actual control lever. Read narrowly, as a systemic bypass of a model's safety constraints, a code-reading prompt is not a jailbreak. Read expansively, as any successful elicitation of restricted behavior, almost any prompt engineering qualifies. The US government's choice of one definition over the other, applied to one lab's models and not to comparable systems, is what turns a contested technical finding into a binding precedent.

The brand story is what gets the airtime. Ramp's AI Index, drawn from roughly 70,000 business customers, shows Anthropic's share of business AI subscription spend rose 2.5 percentage points in May 2026 to 41%, against OpenAI's 39.5%. Ramp's lead economist Ara Kharazian noted that Anthropic's best month for business adoption was March 2026, when the Defense Department labeled the company a supply-chain risk after it refused contracts for mass surveillance and autonomous weapons, and that the Fable/Mythos ban "if anything" will probably continue to lift Anthropic. On the consumer side, OpenAI still leads per Sensor Tower's State of AI 2026, and Ramp can only resolve specific model spend in about a third of transactions, where the visible skew runs toward Claude Opus variants.

Anthropic raised $65 billion at a $965 billion valuation in late May 2026, filed confidential IPO paperwork, and reported its first profitable quarter. Sam Altman of OpenAI publicly called Anthropic's Mythos posture "fear-based marketing" in April 2026. The brand-asymmetry read is convenient but incomplete. The ban is not just suppressing information. It is making a contested technical claim authoritative without writing it down.

What the case really tests is how AI safety governance works when the trigger is verbal, the predicate is contested, and the entity acting on it is the same entity that would need to defend the finding in court. Three things follow. The Defense Department's earlier supply-chain-risk designation and Anthropic's active lawsuit will either get cited or quietly dropped. The open letter's signatories will either be invited into a formal review or shut out, which would be the cleanest signal of whether the definitional contest is being adjudicated on technical merit. And the next frontier-model ban, if there is one, will either cite this one as precedent or quietly distance itself from it. The first outcome quietly redefines "jailbreak" for every system that follows. The second leaves the next lab to inherit both the standard and the silence it was built on.

The US banned Anthropic's newest models for a 'jailbreak' that researchers say isn't one — type0 | type0

The US banned Anthropic's newest models for a 'jailbreak' that researchers say isn't one

Sources