When researchers ran eight different AI systems against a 17-year-old flaw in FreeBSD's networking code, all eight found it, including a model small enough to run on a laptop and cheap enough that processing a million words costs roughly a tenth of a cent, according to AISLE. The finding undercuts the premise that the best AI vulnerability detection requires the biggest, priciest models. As detection gets cheap and ubiquitous, the harder question becomes what happens after someone finds a door: who patches it, how fast, and which systems cannot be patched at all.
Anthropic published results for its own still-unreleased model, Claude Mythos Preview, showing it found a 27-year-old vulnerability in OpenBSD, achieved 73 percent success on expert-level cybersecurity challenges where no prior AI could finish, and autonomously built working exploits from nothing. Engineers at the company with no formal security training asked Mythos to find remote code execution vulnerabilities overnight and woke to complete, working exploits built by morning, per the UK AI Security Institute citing Anthropic's own red team blog. IEEE Spectrum called it a genuine inflection point. AISLE's comparison data suggests the reality is more uneven: smaller, cheaper systems kept pace with Mythos on this particular task.
Small, cheap AI kept pace with the flagship on this task, AISLE found. The security advantage is no longer in the model itself, but in the scaffolding around it: the infrastructure that converts a detected flaw into a working exploit, a targeting decision, a realistic threat scenario. That pipeline is where the actual bottleneck lives.
Anthropic is distributing Mythos through Project Glasswing, a program that provides access to critical infrastructure operators and open-source security researchers, along with $100 million in usage credits and $4 million in direct donations to open-source security work, per AISLE. The goal is to move defensive benefits into the right hands before the offensive implications compound through the industry.
Bruce Schneier, a security technologist at Harvard, argued this week on his blog that the more pressing concern is not Mythos specifically but the trajectory: as these capabilities become cheaper and more accessible, the bottleneck shifts from finding vulnerabilities to deciding which ones to weaponize, and that second step still requires human judgment about targets, timing, and impact. Defense has to be comprehensive. Attackers only need one open door. Glasswing, he wrote, buys time, not permanence.
The first real test will come when Glasswing's participants publish their own results: whether access to Mythos actually produces faster patches in production systems, or whether the program generates good headlines without changing the underlying security posture of the people who need it most.