Introducing the OpenAI Safety Bug Bounty program
OpenAI formalized something this week that security researchers have been doing informally for two years: treating AI-specific vulnerabilities as a legitimate, fundable discipline. The company launched a Safety Bug Bounty on Tuesday, a companion to its existing Security Bug Bounty program, specifically targeting AI abuse scenarios that fall outside traditional security vulnerability categories.
The hook is the Model Context Protocol. MCP, the protocol that lets AI assistants hook into external tools and data sources, is now explicitly in scope — and OpenAI set a concrete bar for submissions: the behavior must be reproducible at least 50 percent of the time. That is a higher bar than it sounds. Prompt injection is not a buffer overflow. It is a class of probabilistic manipulation, and getting an attack to fire more often than not against a defended target is genuinely hard.
The real-world context makes this concrete. At RSAC 2026 last week, Michael Bargury, CTO of security firm Zenity, demonstrated zero-click prompt injection attacks against Microsoft Copilot, Google Gemini, Salesforce Agentforce, and ChatGPT. He was not showing theory. He was showing practice. "AI is just gullible," Bargury told The Register. "We are trying to shift the mindset from prompt injection because it is a very technical term and convince people that this is actually just persuasion." The framing is deliberate: the attack surface is not a software bug, it is a conversation. Bargury has covered Cursor and custom agent platforms in separate Zenity research demonstrations, but those were not part of the RSAC demo itself.
Academic research released this month on the arXiv preprint server quantified the unevenness across the MCP ecosystem. Researchers evaluated seven MCP clients and found significant security disparities. Claude Desktop, Anthropic's client, implements strong guardrails against cross-tool poisoning and unauthorized tool invocation. Cursor, the AI coding assistant, shows high susceptibility to both. The variance is not minor — it reflects the difference between a team that built with adversarial tool invocation in mind and one that did not.
OpenAI's Safety Bug Bounty is organized around three categories: Agentic Risks including MCP, OpenAI Proprietary Information, and Account and Platform Integrity. The second category covers scenarios where model outputs leak internal reasoning chains or system prompts — a class of issue that standard security programs do not have a framework to evaluate. The third covers manipulation of trust signals that determine what an AI agent will and will not do on a user's behalf.
Jailbreaks are explicitly out of scope. OpenAI runs separate private campaigns for certain harm categories — the company said it handles Biorisk content issues in ChatGPT Agent and GPT-5 through those private programs rather than the public bounty. The distinction is worth noting: public research into model manipulation is separated from the company's own red-teaming process, which means external researchers cannot easily verify how well those private programs work.
The Safety Bug Bounty program does not publish reward tiers. OpenAI's existing Security Bug Bounty, which covers traditional vulnerabilities, caps payouts at $100,000 for exceptional critical findings — an amount OpenAI increased from $20,000, as recorded by Bugcrowd. What the Safety program will pay, and whether that amount is competitive with the going rate for MCP security research, is not public. That matters: if the payout does not match the effort required to find reproducible MCP vulnerabilities, the program will attract submissions that are easy to demonstrate, not ones that reflect real risk.
The 50 percent reproducibility threshold is the most concrete signal in an otherwise sparse announcement. It tells you OpenAI knows prompt injection is hard to pin down reliably. It also tells you they are trying to define a discipline — with rules, standards, and a formal submission process — rather than waiting for chaos to define it for them. Whether that discipline scales with the MCP ecosystem it is meant to protect is the open question.