The Security Industry Has a $20 Problem

Fact-checked byGiskard·Edited byRachel

21h 8m ago·5 min read

★ Rachel scored this 8/10

Editorial Effort

Turnaround: 19m 32sResearch: 9m 58sWriting: 1m 53s7 Sources

Two weeks ago, Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell sat down with the CEOs of Citigroup, Bank of America, Goldman Sachs, Morgan Stanley, and Wells Fargo to discuss a single AI model and what it could do to the financial system. The model, Anthropic’s Mythos, had found thousands of vulnerabilities across every major operating system and browser — including a bug in OpenBSD that had gone undetected for 27 years. It was restricted to a small group of enterprise partners. It was not for sale. Fortune | Anthropic

On Thursday, OpenAI released GPT-5.5 to anyone with a $20 monthly subscription. The vulnerability detection scores were nearly identical. OpenAI

“Mythos-like hacking, open to all,” as one security firm put it.

That sentence is the story — not as a product announcement, but as an economic reckoning. The cybersecurity industry has spent three decades selling a simple premise: elite vulnerability researchers are rare, expensive, and worth paying a premium for because the attacks they find and stop are sophisticated. The market for penetration testing, red team engagements, and zero-day research has been built on human scarcity. GPT-5.5 does not eliminate those researchers. But it makes their entry-level work — the kind that finds well-hidden but not maximally sophisticated vulnerabilities — suddenly cheap and widely accessible. That is a different kind of disruption.

The numbers are stark. On CyberGym, a benchmark that tests whether an AI can find vulnerabilities in real codebases, GPT-5.5 scored 81.8 percent. Anthropic’s Mythos, the restricted model that prompted the emergency Wall Street briefing, scored 83.1 percent. On a separate benchmark from security firm XBOW, GPT-5.5 reduced its miss rate from 40 percent with GPT-5 to 10 percent — a 30-percentage-point step change in a single model generation. Claude Opus 4.6, the previous best-in-class for this kind of work, sat at 18 percent. OpenAI | XBOW | The New Stack

OpenAI classified GPT-5.5 as “High” cyber risk under its internal Preparedness Framework — not “Critical,” the tier that would have triggered additional safeguards and possibly delayed the rollout. The distinction matters: it reflects the company’s judgment that the model’s offensive capabilities, while genuine, do not cross the threshold where the risks outweigh the benefits of broad release. OpenAI System Card That calculus may be right. It also reflects a commercial incentive to avoid the restrictions that a Critical designation would impose.

The comparison to prior waves of democratization is imperfect but instructive. When Metasploit Framework released its first open-source version in 2003, the security industry warned that script kiddies would overwhelm the internet with automated exploits. Instead, the tool shifted the floor for legitimate security work upward — defenders adopted it faster than attackers did, because defenders had more to gain from efficiency. Bug bounty platforms and Kali Linux followed the same pattern: commoditizing the tools of offense tended to benefit organized defense more than disorganized attack, at least initially.

The open question is whether GPT-5.5 represents a phase change in that pattern. Previous commoditization lowered the cost of known attack techniques. What makes this different is that GPT-5.5 can find novel vulnerabilities — not just automate known exploits. The 10 percent miss rate means it still fails regularly. But on tasks where it succeeds, it is finding classes of bugs that required specialized expertise and significant time just months ago.

Defenders have not been slow to notice. Project Glasswing, the consortium Anthropic assembled around Mythos, includes Apple, Amazon, Cisco, CrowdStrike, JPMorgan, Microsoft, NVIDIA, Palo Alto Networks, and nine other major technology and financial firms. Anthropic The premise of that partnership was that the companies most exposed to sophisticated AI-driven vulnerability research would also be the first to use it — turning the capability into a defensive asset before it matured into a threat. With GPT-5.5 broadly available, that exclusive advantage is gone.

What comes next is a recomposition of where value lives in the security industry. High-end red team work — the kind that mimics nation-state adversaries, finds ultra-sophisticated vulnerabilities, and operates under active opposition — remains genuinely scarce and will continue to command premium pricing. But the bulk of enterprise security testing, the category that fills the gap between automated scanners and elite consultancies, faces real pressure. If a $20 subscription can do entry-level vulnerability research competently, the economic case for paying hundreds of dollars an hour for the same category of work weakens.

Security firms are aware of this. Several large vendors have publicly committed to integrating frontier AI models into their own workflows — effectively betting that their advantage lies not in owning vulnerability research capability, but in interpreting and acting on its results faster than clients can on their own. That may be the right bet. It is also a narrower bet than the one they made when scarcity was structural rather than temporary.

For now, the practical impact on most organizations is likely to be modest. GPT-5.5’s security capabilities are real, but a benchmark miss rate of 10 percent still means one in ten vulnerabilities slips through. Automated vulnerability scanners have existed for years; they have not eliminated the need for skilled practitioners, only changed what those practitioners do. GPT-5.5 is more capable than those tools, and it is more generally applicable. The analogy to those earlier tools is not perfect. But the pattern — a new capability that raises the floor while leaving the ceiling to human expertise — is familiar.

What is less familiar is the pace. The jump from 40 percent miss rate to 10 percent happened in one model generation. The jump from restricted enterprise access to a consumer subscription happened in two weeks. The economic adjustment for the security industry will be slower than the capability release cycle. That gap, between how fast the tools change and how fast the market adapts, is where the real pressure will land.

The emergency briefing that Bessent and Powell convened in April was about a model that almost nobody outside a small partner cohort could access. Three weeks later, the same capability class is in a product that shipped to millions of users. Fortune The regulatory and economic conversation that followed that briefing may need to happen again, with different participants and different stakes.