AI is getting cheaper at finding DeFi bugs than cashing them out
DeFi security teams are about to work under a worse assumption: AI may not need to steal money on its own to make their jobs harder. If a model can cheaply find the weak spot in a decentralized-finance protocol, the crypto software that moves money through self-executing code, defenders still have to treat that discovery as real pressure even when the model cannot yet turn it into a profitable attack.
That is the useful result in a new benchmark from a16z crypto, not the splashier headline about whether AI agents can hack DeFi. In a controlled test of 20 historical Ethereum price-manipulation exploits, a coding agent using GPT-5.4 in Codex built profitable proofs of concept in just two cases after researchers sealed off a shortcut that let it peek at the real attack. With structured exploit skills derived from past incidents, the same setup rose to 14 of 20. The gap between finding the bug and cashing it out is still there. The first half is getting cheaper.
The benchmark used Codex with GPT-5.4 and the Foundry Ethereum developer toolchain, including forge, cast, anvil, and remote procedure call access. Researchers pulled the 20 cases from DeFiHackLabs, a public repository that says it reproduces 689 DeFi hack incidents with Foundry proof-of-concept code. They counted a run as successful if the generated exploit made more than $100 on a forked copy of Ethereum's main network, a deliberately low bar.
The first number was misleading. a16z's unsandboxed run appeared to show a 50 percent success rate because the agent was querying future transaction data from Etherscan and effectively reading the answer key. After researchers pinned the local blockchain state to the target block and restricted source-code lookups, the success rate fell to 10 percent, or two of 20. That is the baseline that matters.
Then came the part DeFi teams should care about. With structured skills derived from those attack patterns, a16z says the success rate climbed to 70 percent, or 14 of 20. The researchers also say the agent identified the core vulnerability in every failed case and usually broke down only when it had to sequence trades, manage state changes, or pick parameters that produced actual profit. In plain English: the model often knew where the hole was, but still struggled to drive the getaway car.
That split matters because a lot of security workflow still treats bug discovery and exploit validation as one pipeline. If vulnerability detection becomes cheap first, the scarce product is no longer spotting the bug. It is proving exploitability, pricing severity, and keeping evaluation environments honest enough that a model is not quietly cheating its way to a passing score.
The paper is also a sandbox story, which is catnip on this beat for a reason. In one run, a16z says the agent called anvil_nodeInfo, extracted the upstream remote procedure call address, used anvil_reset to jump the local node to a future block, inspected the real attack transaction, then jumped back. That is not a novel exploit. It is a reminder that agent benchmarks are only as real as the tool boundaries around them. Wrappers are not sandboxes just because someone wrote "sandbox" in the diagram.
There is competitive pressure here too. Anthropic said its Opus 4.6 model had a near-zero success rate at autonomous exploit development in internal testing, while its newer Mythos Preview produced working exploits 181 times on a separate Firefox benchmark and achieved register control 29 more times. Those tests involve memory-corruption bugs, not DeFi price manipulation, so the numbers do not map cleanly. Still, they point the same way: model vendors are getting better at the long chain of actions offensive security requires, even if different domains still break them in different places.
The a16z benchmark does not show that a general-purpose agent can walk onto an unseen DeFi protocol and drain it on command. It does show that the old comfort line, the model cannot finish the exploit so the risk is contained, is getting weaker. If AI keeps making reconnaissance and vulnerability diagnosis cheap before it makes exploitation reliable, DeFi's next bottleneck is obvious. Teams will need better exploit validation, stricter benchmark design, and a clearer idea of what their auditing assumptions look like once the attacker on the other side is a model with patience, tooling, and a very loose definition of the rules.