OpenAI Wants to Pay Someone to Break Its Bio Safety Guardrails. Nobody Outside the Room Will Know If They Succeed
On Thursday, OpenAI released its most biology-capable model to date. Same day, it opened a bug bounty: $25,000 for the first researcher to find a universal jailbreak — a single prompt that defeats the model's bio safety guardrails across all five questions in its safety challenge. The testing window runs April 28 through July 27. All findings are covered by NDA. The public will learn nothing of what is discovered inside the program.
The model is called GPT-5.5. Under OpenAI's own Preparedness Framework, it rates High for biological and chemical capabilities — the second-highest tier in a system designed to flag models that could assist novice actors in creating biological or chemical threats. A High rating means the model can provide meaningful help to someone with basic technical training who is trying to synthesize known pathogens or toxins. OpenAI published this rating alongside the model release, in the same blog post that announced the bounty.(OpenAI Blog)
That pairing is the editorial tension the announcement doesn't resolve: the company deployed a model it had rated as capable of assisting biological harm, then immediately turned around and offered to pay hackers to test whether its own guardrails against that exact risk were any good. The bug bounty is not a sign of failure. It is a form of admission dressed as due diligence.
"The model can provide meaningful counterfactual assistance to novice actors with basic relevant technical background that enables them to create known biological or chemical threats," according to the Preparedness Framework's definition of a High biological rating — a definition OpenAI publishes and that SaferAI, an independent AI safety ratings platform, has documented. GPT-5.5 meets that threshold.(SaferAI Ratings)
The prior iteration of this program tells a curious story. The GPT-5 bio bug bounty, announced in August 2025, tested against ten bio and chem safety questions. The GPT-5.5 program tests against five. The reduction coincides with the model's jump to a High biological rating. OpenAI has not explained why fewer questions are in play for a more capable model. The blog post announcing the GPT-5.5 bio bug bounty does not address the discrepancy.(OpenAI Blog)
All of this happens under NDA. Participants cannot discuss what they find. OpenAI cannot be cross-examined on whether the five questions are representative, whether the universal jailbreak threshold is meaningfully hard, or whether any researcher has already come close. The company says it will strengthen safety protections based on what it learns. It will not say what it learns.
Independent biosecurity experts have noted that LLMs already outperform most expert virologists within their areas of specialization, and that novices using LLMs perform four times better on biosecurity-relevant tasks than novices with internet access alone.(EA Forum) The gap between those capabilities and the safeguards meant to contain them is exactly what the bug bounty is meant to probe. The problem is that the probe is pointed inward — OpenAI is paying for the information, and OpenAI decides what to do with it.
The biological research community has noticed the model's capabilities. Derya Unutmaz, an immunology professor at the Jackson Laboratory for Genomic Medicine, used GPT-5.5 Pro to analyze a gene-expression dataset with 62 samples and nearly 28,000 genes — work he said would have taken his team months. GPT-5.5's introduction blog post highlights similar use cases in drug discovery and genetics research. The model excels at reasoning over molecules, proteins, genes, and disease-relevant biology — the same capabilities that raise misuse concerns under the Preparedness Framework.(OpenAI Blog)
OpenAI frames the bounty as a sign of seriousness. The company says it evaluated GPT-5.5 across its full suite of safety and preparedness frameworks, ran targeted testing for advanced biology and cybersecurity capabilities, and worked with internal and external red-teamers before release. The bug bounty is positioned as the next step.
But the structure of the program — NDA-covered, results withheld, findings controlled — means the public cannot verify whether OpenAI's internal rigor was sufficient. A model rated High for biological capabilities is now in use. The company is paying researchers to find flaws in its safeguards. And the researchers cannot tell anyone what they find.
That is not a safety protocol. It is a liability management exercise wearing the costume of one.