Microsoft helped launch Mythos. Bloomberg says the NSA is using it to probe Microsoft code.
Bloomberg reported that National Security Agency staff are testing Anthropic’s restricted cyber model Mythos to hunt for flaws in Microsoft software and other widely used programs. That should worry software vendors even if they have never touched an AI benchmark, because it means machine-speed bug hunting is reportedly moving from lab demos into the workflow of a U.S. intelligence agency.
The awkward detail is Microsoft’s role on every side of this story. Anthropic named Microsoft a launch partner in Project Glasswing, the controlled rollout around Mythos. Microsoft’s Security Response Center said it evaluated an early Mythos snapshot on its own CTI-REALM benchmark and saw substantial gains over prior models, is testing the model in internal security workflows, and will provide gated research preview access through Microsoft Foundry for Azure customers in Project Glasswing. Bloomberg also reported that NSA staff are using the model to search for flaws in Microsoft products.
That combination makes the Mythos story less about Anthropic hype and more about a new accountability loop for software vendors. A model that one company helps benchmark and distribute can reportedly also be used by the state to interrogate that company’s code. If Bloomberg’s reporting holds, the important shift is not just faster vulnerability discovery. It is that the same frontier system can now sit inside the vendor, the cloud channel, and the government defender looking for weaknesses.
Anthropic has been building the case for this since April 7. In its Project Glasswing announcement, the company said Mythos Preview had already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser, and that it was extending access to more than 40 additional organizations with up to $100 million in usage credits and $4 million in direct donations. In a separate technical writeup from Anthropic’s Frontier Red Team, the company said Mythos could identify and exploit zero-day flaws, meaning previously unknown security bugs, across every major operating system and browser during testing. Anthropic also said more than 99 percent of the vulnerabilities it found were still unpatched, which helps explain why it has disclosed little evidence publicly.
That is where the caveat belongs. Most of the strongest capability claims still come from Anthropic itself. Bloomberg’s NSA report matters because it suggests an outside institution with real operational stakes thinks the model is useful enough to test. Microsoft’s MSRC post upgrades the story too. Microsoft said it is putting Mythos into internal processes and exposing it, in limited form, through Microsoft Foundry for Azure customers in Project Glasswing. That is not proof that every Anthropic claim is right. It is proof that another major security organization took the model seriously enough to build workflow around it.
There is some independent evidence that Mythos-like claims are not purely theatrical. VulnCheck has cataloged CVEs, the public identifiers used for disclosed software vulnerabilities, tied to Anthropic researchers or Anthropic-linked collaborations in Firefox, FreeBSD, OpenSSL, NGINX Plus, and other projects. Bruce Schneier, who was skeptical of Anthropic’s rollout theatrics, still wrote that the capability jump appears real and noted that the security company Aisle replicated some of Anthropic’s findings with older public models. That does not validate every dramatic claim in Anthropic’s red-team post. It does suggest the underlying pressure is real: vulnerability hunting is getting faster, cheaper, and less dependent on a small pool of elite researchers.
If Bloomberg is right about the NSA testing Mythos this way, the bottleneck shifts from finding bugs to fixing them. Software vendors have long treated vulnerability research as a scarce craft. A restricted model in the hands of the NSA changes that equation. So does a world in which Microsoft can be a benchmark host, a cloud distributor, and a target surface at the same time. The next thing to watch is not another benchmark score. It is whether more vendors start talking publicly about patch volume, disclosure strain, and what happens when machine-speed scrutiny reaches code they cannot update fast enough.