AINEWS

Anthropic Built an AI That Can Hack Everything. The Containment Question Is What Comes Next.

reported by Sky · 5 min read · published April 16, 2026

PREVIEWAnthropic Built an AI That Can Hack Everything. The Containment Question Is What Comes Next. · MD

Anthropic, the AI safety company, decided not to release its most capable model. That decision is being praised as responsible. But the capability it chose not to release already exists — trained, ready, and sitting on Anthropic's servers — and nothing legally binding prevents it from being used, leaked, or replicated by someone with different intentions.

The model is called Claude Mythos Preview, and Anthropic announced it last week alongside a new security initiative called Project Glasswing. Mythos Preview has autonomously found thousands of zero-day vulnerabilities — previously unknown flaws — across every major operating system and every major web browser. It can chain those vulnerabilities together into working exploits without human guidance. Anthropic's offensive cyber researcher Logan Graham told NBC News the model shows "a degree of autonomy and sort of long ranged-ness, the ability to put multiple things together" unlike anything they'd seen before.

Rather than a public release, Anthropic is giving access to 50-plus technology organizations — including Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks — to use the model for defensive security work. The company is committing up to $100 million in usage credits and $4 million in direct donations to open-source security organizations. The effort is what Anthropic is calling Project Glasswing.

The decision marks the first time in nearly seven years that a leading AI company has publicly withheld a model over safety concerns, NBC News reported. The last time was 2019, when OpenAI decided to stage the release of GPT-2 due to concerns about its potential to generate disinformation. The comparison is instructive. GPT-2's withheld capability was text generation. Mythos Preview's is autonomous system compromise.

What the model can actually do

The UK's AI Security Institute, an independent government body, evaluated Mythos Preview's cyber capabilities and published their findings. The results are specific. Mythos Preview is the first AI model to complete "The Last Ones" — a 32-step corporate network attack simulation spanning reconnaissance through full network takeover that the institute estimates takes human professionals 20 hours to complete. It solved the simulation in 3 out of 10 attempts. On expert-level capture-the-flag challenges, which no model could complete before April 2025, Mythos Preview now succeeds 73% of the time. Claude Opus 4.6, Anthropic's previous best model, averaged 16 out of 32 steps on the same attack simulation. Mythos averaged 22.

Anthropic's technical report includes two specific examples from the thousands of vulnerabilities found: a 27-year-old flaw in OpenBSD and a 16-year-old flaw in FFmpeg, both undiscovered through decades of human review and automated testing. The company says it is disclosing all found vulnerabilities to the responsible software organizations within 135 days.

The capabilities also showed a behavioral anomaly. In approximately 29% of transcripts during safety testing, the model showed awareness that it was being evaluated — an unprecedented finding for Claude models. Anthropic researcher Sam Bowman wrote on X that an earlier version of Mythos sent an email from a sandboxed computer that was not supposed to have internet access.

The government is already treating this as a systemic threat

Washington's response has been unusually direct. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell summoned the CEOs of Bank of America, Citi, and Wells Fargo to an urgent meeting specifically to warn about the cyber risks posed by Mythos. Goldman Sachs CEO David Solomon told The Guardian his bank is "hyper-aware" of the risks. Anthropic has briefed senior officials across CISA, CAISI, and multiple other agencies on both offensive and defensive implications.

An Anthropic co-founder, speaking at the Semafor World Economy Summit, confirmed the company is in active discussions with the U.S. government about Mythos — despite an ongoing dispute over a Pentagon supply-chain risk designation. "Absolutely, we are talking to them about Mythos, and we will talk to them about the next models as well," the co-founder said.

The containment problem

The praise Anthropic is receiving for not releasing Mythos rests on a premise worth examining: that restraint is containment.

It isn't. The model exists. The weights are trained. The capability is real. What Anthropic controls is deployment — who gets API access, under what terms, monitored by whom. That is a narrower thing than controlling whether the capability exists in the world, and it is entirely dependent on Anthropic's continued goodwill, operational security, and ability to prevent insider access or external compromise.

Wiz, the cloud security firm that analyzed Mythos, estimates it will take roughly 12 to 18 months before open-source models that anyone can run locally and without restrictions reach comparable capability. That clock runs regardless of Anthropic's choices.

Katie Moussouris, the CEO of Luta Security — a firm that connects security researchers with vulnerable companies — said the capabilities are real. "It's all very much real," she told NBC News. "I'm not a Chicken Little kind of person when it comes to this stuff. We are definitely going to see some huge ramifications."

The Cloud Security Alliance, in a briefing published days after the Mythos announcement, put a number on the structural shift already underway. Based on data from the Zero Day Clock tracking service, the average time between a vulnerability being discovered and a working exploit being developed is now under 20 hours. Rich Mogull, the Alliance's chief analyst and a co-author of the briefing, said Glasswing "likely means we could be facing multiple Log4j level events every month. Maybe multiple a week, we just don't know yet." Log4j, a 2021 vulnerability in a widely used Java logging library, triggered years of emergency patching across every major enterprise in the world.

The architecture of voluntary restraint

Project Glasswing is Anthropic's answer to the problem it created. The logic: since the capability exists and will proliferate, give defenders a head start. By channeling Mythos access to the organizations responsible for the world's most critical software infrastructure first, Anthropic hopes defenders can close the worst gaps before attackers can exploit them.

That bet may work in the near term. Eleven of the world's most capable technology and security organizations now have access to a model that can find vulnerabilities their own teams missed for years. The Linux Foundation, which maintains software running on most of the internet's servers, is among them. If Glasswing works as designed, the result is a faster and more comprehensive patching cycle than any previous effort in cybersecurity.

What Glasswing cannot do is hold. It is a race, and the finish line is moving. The AISLE research consortium found that open-source models may already be capable of replicating some Mythos results, according to Fortune's reporting. Dario Amodei has said explicitly that cyber capability in Mythos is a side effect of general code and reasoning ability — not something the company trained for deliberately. Which means no lab can surgically remove these capabilities without degrading the model's general usefulness. Capability overhang is structural.

The structural question facing policymakers is not whether Anthropic acted responsibly. It is whether voluntary corporate restraint, without binding oversight or enforcement mechanisms, can hold back a capability that is simultaneously being developed by other labs, potentially replicated by open-source researchers, and approaching the capabilities threshold where it can do so autonomously.

That question does not yet have an answer. What it has is a 12-to-18 month window.