Mythos Finds 2,000 Bugs in Cloudflare, 400 Critical
Cloudflare has spent years triaging security vulnerabilities the same way most large organizations do: prioritize the critical ones, deprioritize the low-severity ones, and let the rest accumulate in a backlog. What Anthropic's Mythos model found in their systems suggests that approach may be finished. The model, which Cloudflare tested through Anthropic's Project Glasswing early access program, was able to take low-severity bugs sitting in that backlog — the kind engineers typically ignore for months — and chain them into single severe exploits without human involvement. Cloudflare documented the technique in a blog post.
The finding matters because it reframes the vulnerability question. Cloudflare found 2,000 bugs across its critical-path systems using Mythos, with a false positive rate better than human testers — 400 of those were high- or critical-severity. Mozilla, another Glasswing partner, found and fixed 271 vulnerabilities in Firefox 150 while testing the same model, over 10 times more than it found in Firefox 148 using the previous-generation Claude Opus 4.6, according to Anthropic's blog. Those numbers come from Anthropic's own Glasswing data — a self-reported figure the company has not independently audited. The model that found them has not been released publicly, because — as Anthropic put it — no company, including Anthropic, has developed safeguards strong enough to prevent misuse.
The access-restricted Glasswing program is Anthropic's answer: a curated group of partners who agree to coordinated disclosure timelines. One finding that has not yet been fully disclosed is what Mythos constructed when it targeted wolfSSL, a cryptography library used in billions of devices. The model built an exploit that could forge certificates appearing legitimate to end users — a technique that earned a CVE designation CVE-2026-5194. Anthropic says a full technical analysis is still pending, but the CVE assignment signals the vulnerability earned a critical severity rating under the standard scoring system — the kind that typically triggers emergency patching across enterprise infrastructure.
The UK AI Security Institute quantified how fast the window is closing: in February 2026 it estimated that the length of cyber tasks AI models can complete has been doubling every 4.7 months since reasoning models emerged in late 2024, according to AISI's blog, meaning the gap between a vulnerability being found and it being exploited is compressing faster than most organizations can respond.
Palo Alto Networks saw the effects directly. It released 26 CVEs covering 75 issues in May 2026's Patch Wednesday — five times its normal volume — after scanning more than 130 products using AI-driven techniques, the company wrote. Based on its current methodology, it modeled a narrow three-to-five-month window for defenders to move first before AI-driven exploits become routine — a projection based on current vulnerability-discovery rates, not a measured empirical trend. The uncomfortable independent confirmation came from Bruce Schneier's blog, which cited the security company Aisle replicating the same vulnerabilities Anthropic found using older, cheaper, publicly available models when given specific targets — the security moat around frontier AI scanning capability is narrower than the access restrictions suggest.
What to watch next is whether the human side of the remediation pipeline can compress at all. The average time to patch a high- or critical-severity bug found by Mythos is roughly two weeks, according to Anthropic-disclosed Glasswing data — unaudited and partner-reported. That figure is consistent with what NIST's enterprise patch management guidance describes as typical for production systems, which still have to clear testing, deployment, and rollback constraints before patches become reality. Whether organizations across sectors are experiencing this backlog at the scale Glasswing data suggests is an open question — the Glasswing figures are partner-disclosed, not independently measured. The economics of security shift accordingly — fewer organizations can afford the human bottleneck when the discovery side runs at machine speed.