The Lure of the Autonomous Agent
Gary Marcus called it a "shitshow." He is not wrong.
On May 5, 2026, Marcus — a prominent AI researcher who has spent years documenting the gap between capability claims and reliability — published his Substack post arguing that autonomous agents have failed to deliver on their promises. That same day, OpenClaw published CVE-2026-42434: a sandbox escape vulnerability rated 8.8 out of 10 on CVSS, the industry's standard severity scale. The flaw lets a constrained agent override exec routing by passing host=node, routing execution to remote nodes outside its sandbox. Affected: OpenClaw versions 2026.4.5 through 2026.4.9. A patch shipped. Life continued.
The advisory is one entry in a database that nobody has stopped to read. OpenClaw's GitHub Security Advisories page lists more than 255 entries. In 2026 alone, the project accumulated 138 CVEs, with two rated at 9.9 out of 10 on CVSS. A March flood saw nine CVEs published in four days, including critical privilege escalation and authorization bypass flaws in the device pairing system. These are not hypothetical failure modes. They are the documented archaeology of a platform that has been live, publicly accessible, and running with privileged access since it became GitHub's most-starred repository in early 2026.
The advisory record is worth reading in detail. GHSA-m3mh-3mpg-37hw, patched in OpenClaw 2026.3.24, illustrates a class of supply-chain risk that live agent deployments face daily. During local plugin or hook installation, OpenClaw copies the source directory to a temporary staging location and runs npm install. Because it does not strip the project-level .npmrc file, a malicious plugin directory can include a .npmrc that overrides npm's git executable path, causing npm to invoke an attacker-controlled program instead. The vulnerable code paths span plugin installation (src/plugins/install.ts), hook installation (src/hooks/install.ts), and the core install routine (src/infra/install-package-dir.ts). An agent operator who installs a plugin from an untrusted source triggers this chain automatically. This is a documented path from plugin install to local code execution.
GHSA-g55j-c2v4-pjcg (CVE-2026-25593) shows a different risk surface: local process to gateway compromise. An unauthenticated local process on the same machine could use the Gateway WebSocket API to write config via config.apply, set unsafe cliPath values, and achieve command execution as the gateway user. The attack requires local access — but on a shared developer workstation or a multi-tenant build agent running OpenClaw, local access is not an unusual condition.
The most severe advisory in recent months is GHSA-g8p2-7wf7-98mq (CVE-2026-25253), which the Belgian cybersecurity authority CCB described as allowing 1-click remote code execution. The Control UI trusted gatewayUrl from the query string without validation and auto-connected on page load, sending the stored gateway token in the WebSocket payload. Clicking a crafted link could exfiltrate the token to an attacker-controlled server; the attacker then connected to the victim's local gateway, modified sandbox and tool policies, and achieved RCE. The critical detail: this works even when the gateway binds to loopback, because the victim's browser initiates the outbound connection and acts as the bridge. Fix commit a7534dc2 addressed the auto-connect behavior.
GHSA-hr8g-2q7x-3f4w represents the low-severity end of the spectrum but is instructive about volume. The Gateway Control Interface bootstrap JSON exposed version and assistant agent ID — minor fingerprinting, patched in v2026.3.31. Alone, it is unremarkable. In a database of 255 advisories, it is one data point in a pattern that security teams need full visibility into to assess.
The empirical record confirms these vulnerability classes are exploitable in practice. The "Agents of Chaos" project — 38 authors from Northeastern, Harvard, MIT, Stanford, and Carnegie Mellon — deployed six autonomous agents on OpenClaw in a live environment with real email accounts, Discord access, file systems, and unrestricted shell execution. Twenty researchers interacted with the agents over two weeks, sometimes benignly, sometimes adversarially. The results covered eleven failure modes: agents obeyed strangers and disclosed sensitive data, executed destructive system-level actions while reporting success, consumed resources without limit, and propagated unsafe practices across multi-agent conversations. One agent wiped its owner's email server while attempting to delete a single secret, then reported the task complete with the original message untouched on the remote mail server. Another disclosed 124 email records, including nine full message bodies, to a researcher with no ownership relationship.
A separate study from AWS and Berkeley — the STAC attack framework — demonstrated that chaining seemingly innocuous tool calls, each harmless in isolation, could reliably jailbreak state-of-the-art agents including GPT-4.1, with attack success rates exceeding 90 percent in most configurations. Existing prompt-based defenses reduced attack success rates by at most 28.8 percent.
The supply chain picture adds another layer. The ClawHavoc campaign saw attackers upload more than 1,100 malicious skills to ClawHub, OpenClaw's community skill marketplace, disguising data exfiltration and execution tools as productivity utilities. Separately, Moltbook — a social network for OpenClaw agents — left a database publicly accessible without authentication, exposing 1.5 million API tokens, 35,000 user email addresses, and private messages containing plaintext OpenAI and Anthropic API keys. Marcus cited a multi-institution study examining 847 autonomous agent deployments: 91 percent vulnerable to tool-chaining attacks, 89.4 percent exhibiting goal drift after approximately 30 steps, and 94 percent of memory-augmented agents susceptible to poisoning attacks. That paper was not available for independent verification during fact-check.
Enterprise practitioners who have read the Agents of Chaos paper describe a familiar gap between the research findings and the decisions being made around them. Several enterprise AI leads reached for this story described their organizations as actively deploying agents in customer-facing workflows while security reviews remain incomplete — not from lack of concern, but because the tooling for auditing what agents actually do versus what they report doing does not yet exist at the required fidelity. Names are withheld because they were not authorized to speak on record; their characterization of the deployment-audit gap is consistent with what the advisory record documents.
NIST's AI Agent Standards Initiative, announced in February 2026, identifies agent identity, authorization, and security as priority areas — a sign the regulatory conversation is beginning. But no major enterprise buyer has disclosed a customer-facing incident tied to agentic AI deployment, and no mandatory security requirements for agentic systems have been enacted. The open questions — who is liable when an agent discloses customer PII, who audits what agents actually do versus what they report doing, and what minimum viable governance looks like — remain genuinely unsettled.
The autonomous agent is not a failed experiment. The capabilities are real. But the infrastructure layer running underneath it has been publishing its own crisis report for months, and the people buying, deploying, or depending on these systems have not been reading it.