OpenClaw’s security mess is really an agent-security vocabulary problem
OpenClaw’s security problem is not that nobody is looking for bugs. It is that too many people are looking with yardsticks built for normal software, then calling expected agent behavior a critical vulnerability. For builders adopting agent systems, that matters because noisy security reports can bury the real flaws faster than maintainers can fix them.
The company’s own numbers make the pressure visible. In an April 30 OpenClaw blog post, OpenClaw said GitHub showed 1,309 security advisories since Jan. 10, with 535 published and 746 closed as invalid. It said 95 of 109 critical-severity reports were invalid, an 87 percent false-positive rate. That is not a comforting statistic. It is a warning that the field still has not agreed on what counts as a vulnerability when software can read files, call tools, ask for permissions, and operate through a human-approved workflow.
OpenClaw, the open-source agent infrastructure project behind the Claw coding agent and related runtime tools, is making that argument in public because the alternative is worse. The project said it fixed real bugs, including authentication bugs, privilege confusion, reconnect scope widening, sandbox bypasses, unsafe environment handling, and approval-path mistakes. Those are not philosophical disputes. They are the kinds of bugs that matter when an AI agent can touch a developer’s machine.
The hard part is separating those bugs from reports that confuse an agent doing its job with an agent escaping its cage. OpenClaw said its ClawSweeper triage agent now handles issue and pull-request triage so maintainers can keep up with the security-report firehose. The irony is tidy enough: an agent project is using an agent to survive the operational load created by people testing agents.
The same blog post said the team closed more than 700 ClawHub moderation issues in the last month, around 460 of them rescan appeals from skill authors whose submissions were wrongly flagged as suspicious. ClawHub is OpenClaw’s distribution hub for reusable agent skills, so a bad scanner there does more than annoy maintainers. It can turn the marketplace into a queue of false alarms, where legitimate authors wait behind automated suspicion.
That is the useful read on the 87 percent figure. It does not prove OpenClaw is safe. It proves agent-security tooling is still crude. Traditional software scanners can often reason about a fixed application boundary. Agent systems are messier: their normal job is to move across tools, inspect context, and ask a model to decide what to do next. If a report treats every tool call as exfiltration, every permission request as privilege escalation, or every prompt as code execution, the word “critical” stops doing useful work.
The sharpest example is the fight over the Agents of Chaos preprint, a March 2026 academic paper that described failure patterns in AI agents. The paper drew broader attention through coverage such as Trending Topics EU’s write-up and OpenClaw.report’s deep dive. OpenClaw’s rebuttal says the researchers ran OpenClaw in sudo mode with disabled guardrails, broad shell access, and no sandboxing, then described the results as if they represented default behavior. If that characterization is right, the study tested a dangerous configuration and let readers infer a default product failure. If it is wrong, OpenClaw is downplaying a real class of risks by arguing about setup.
Either way, the dispute points at the same missing layer: shared evaluation rules. Agent security needs a vocabulary for the difference between “the system did what an authorized user configured it to do,” “the system asked for too much,” and “the system broke containment.” Without that vocabulary, researchers, maintainers, and enterprise buyers end up fighting over severity labels instead of reducing risk.
The enterprise context raises the stakes. OpenClaw said its foundation now includes contributors or backers from Nvidia, Microsoft, OpenAI, Atlassian, Tencent, and Blacksmith. Nvidia’s own March blog post said OpenClaw passed 250,000 GitHub stars in 60 days, overtaking React as GitHub’s most-starred software project. That kind of adoption makes security triage a supply-chain problem, not a maintainer inconvenience.
There is a skeptical version of this story, and it is fair. The 87 percent false-positive rate comes from OpenClaw interpreting OpenClaw’s own advisory queue. The public GitHub Security Advisories page can show the project’s advisory volume, but it does not independently validate every invalid closure or severity judgment. A self-defensive blog post is still a self-defensive blog post, even when it includes numbers.
The counterweight is that OpenClaw did not simply claim critics were wrong. It named bug classes it fixed, published counts for invalid advisories, described moderation failures in its own skill hub, and argued over the configuration details of a specific research paper. That is more useful than the usual “security is our top priority” wallpaper. It gives outside researchers something to check.
What to watch next is whether agent-security evaluation becomes more precise before adoption outruns the triage system. The next useful benchmark will not just ask whether an agent can be tricked. It will specify the permissions, sandbox, approval path, tool surface, and expected behavior before assigning severity. Until then, the field will keep producing two bad outcomes at once: real bugs that need fixing and critical reports that collapse under first contact with the runtime.