A small security research shop says it spent about $1,000 in compute to have an autonomous AI agent audit FFmpeg, the open-source media library that handles video inside browsers, streaming backends, and most of the editing and transcoding tools in use today. The agent, by the shop's account, walked away with 21 previously unknown vulnerabilities, several of which had sat in the codebase for 15 to 20 years without being caught by years of continuous fuzzing or manual audit work.
The vendor framing matters, and the source has to be read on those terms. depthfirst's research post is a first-party capability claim from the company that did the work, not an independent benchmark. Eight of the 21 findings carry CVE identifiers (CVE-2026-39210 through CVE-2026-39218) that depthfirst states are already assigned; those entries had not been independently confirmed in MITRE or NVD at the time of writing, and the descriptions should be checked against the public databases before the report is treated as authoritative. The remaining 13 are tracked internally as DFVULN-117 through DFVULN-127.
The most concrete piece of work in the source is a reproducible proof-of-concept against DFVULN-127, a heap buffer overflow in FFmpeg's AV1 RTP depacketizer, in the file rtpdec_av1.c. A 183-byte packet delivered through the ordinary ffmpeg -i rtsp://attacker/stream invocation is enough to trigger it, according to depthfirst's walkthrough, which traces the bug to a "Temporal Delimiter 'ignore and remove'" branch that advances an internal pointer without allocating the buffer it then reads from. The post walks through a 64-byte posix_memalign allocator geometry on Linux that lets the attacker overwrite an adjacent AVBuffer struct's function pointer, producing what depthfirst describes as a remote-code-execution primitive, a class of bug where a crafted input gives the attacker code execution on the target machine.
The caveat is important. Only DFVULN-127 has a published end-to-end RCE walkthrough in the report. The other 20 findings are catalogued with reproduction notes and bug-class summaries, but the source does not claim, and a careful reader should not assume, that all 21 are equally exploitable in practice. depthfirst also does not address the practical question of reachability: which RTSP libraries ship with which browsers, which production deployments actually transcode untrusted media, and how often the affected code paths are reachable in a default install.
The bug catalogue is broad. Affected components span demuxers (TS, DASH, AVIF, CAF, AVI), decoders and encoders (VP9, yuv4mpegenc, img2enc), RTP depacketizers (AV1, JPEG, LATM, MPEG-4 AAC), server and client paths (RTSP, RTMP), swscale, and the option parser. Most of the classes are heap and stack buffer overflows, with a few integer overflows and one heap underflow. The spread is itself part of the story: a parse-heavy C codebase that ingests hundreds of media formats is structurally a zero-click attack surface, in the sense that it routinely parses untrusted bytes from the network with no human in the loop.
The more interesting claim is the cost curve. depthfirst pegs its scan at roughly $1,000 in compute and frames that as about a tenth of what Anthropic spent on its Mythos FFmpeg scan, an April 7, 2026 post by Anthropic that confirms prior work on FFmpeg but does not, on its own, confirm the $10,000 figure, which is depthfirst's stated comparison rather than an Anthropic-published number. depthfirst also positions its 21-finding result against Google Big Sleep's previously disclosed 13 FFmpeg vulnerabilities, citing Google's project through a goo.gle/bigsleep redirect that this article has not independently resolved. The 13-finding figure should therefore be read as depthfirst's attribution to Big Sleep, not as a count anyone else has verified, and a 21-versus-13 leaderboard across two different scan designs, two different code states, and two different disclosure windows is exactly the kind of comparison that looks tidy in a press release and falls apart on inspection.
The cost-curve framing is the load-bearing one, because the part of the pipeline that has gotten cheaper is the part the model is good at: parsing large C codebases, hypothesizing adversarial inputs, and iterating. The part that has not gotten cheaper is everything downstream, including triage, severity tiering, coordinated disclosure with upstream maintainers, and the public version of the question "is this bug actually reachable in a default install." That is where the policy work lands now. Discovery is cheap; the rest of the disclosure pipeline is not.
What to watch. First, whether the eight CVE identifiers land in MITRE and NVD with descriptions that match the components depthfirst names. Second, whether DFVULN-117 through DFVULN-127 follow a coordinated disclosure timeline and which, if any, get patches upstreamed through FFmpeg's public Git. Third, whether the DFVULN-127 RCE primitive reproduces on allocator geometries other than the 64-byte posix_memalign case depthfirst analyzed, which is the practical line between a research artifact and a live weapon. Fourth, whether anyone outside depthfirst runs the same scan independently and confirms the 21-count on a fresh code state. The story worth following is the cost curve, not the bug list.