Who Verified the Number Everybody Cites on AI Agent Security?

PREVIEWWho Verified the Number Everybody Cites on AI Agent Security? · MD

Uber published the most detailed public accounting of what a production solution to the AI agent accountability problem looks like. On May 21, 2026, the company's engineering blog described a full production system: cryptographic workload identity, a token service that issues short-lived credentials for each step an agent takes, and a policy gateway that checks every tool call against the full chain of actors, from the human who asked for something to the agent that acted on it. The post included real code, JWT payload examples, and architecture diagrams. It is the most complete public record of what solving agent accountability means in a live system—and it confirms the accountability gap is real, even if the specific numbers that quantify that gap remain contested.

The accountability gap

The core issue is not complicated. Traditional identity systems were designed for humans or static services. An AI agent acting across multiple hops—calling other agents, invoking tools, modifying systems—breaks both models simultaneously.

"Current identity models don't describe agency," Uber's engineers write in their blog post. "An agent is best defined as an entity that is authorized to act for or in the place of another." Existing credentials cannot capture that relationship. Worse, execution context—the originating user, the intermediate agents—drops at every hop. A pull request opened by a Monitoring Agent shows the agent as the author, not the on-call engineer who asked it to fix the alert.

Without a full audit trail, incident response means stitching together partial logs from every system the agent touched. The difference between a five-minute fix and a two-hour postmortem is whether you can answer: who asked this agent to do what, and why?

The IETF's draft-ni-wimse-ai-agent-identity-02, published in February 2026, frames the same problem in standards language. AI agents need independent trustworthy identities, automated credential management with short validity periods, and fine-grained access tokens scoped to specific tasks. The draft remains a work in progress and expires in September 2026. Uber has been running a solution in production since at least early 2025, according to the blog post.

What the architecture actually does

Uber's system works like this: every workload gets a cryptographic identity from SPIRE, the SPIFFE Runtime Environment. When an agent needs to call another agent or a tool, it exchanges its SPIRE credential for a short-lived JWT—a JSON web token, a signed digital credential—from the Security Token Service. The JWT contains the full actor chain: not just "this agent is calling this service" but "on-call engineer → Oncall Agent → Investigation Agent is calling this service." The token is scoped to a single hop and single destination. It cannot be replayed to call a different system, according to the blog post.

The MCP Gateway enforces tool-level policies at each call, checking identity against risk classification categories. The AI Gateway handles external LLM calls, with separate guardrails for prompt injection and PII redaction.

What Uber built is a reference architecture for a problem most companies are only beginning to understand they have. Uber's scale—thousands of microservices, multiple agent platforms, complex multi-hop workflows—makes it an unusually credible test case. It also makes Uber's experience potentially unrepresentative. Most enterprises are earlier in the agent deployment curve, with fewer hops, fewer agents, and less sophisticated infrastructure.

The gap behind the headline number

The statistic that appears in nearly every article about MCP security is this: 53 percent of MCP servers run on static API keys, the digital equivalent of a password taped to a monitor. Only 8.5 percent use OAuth, the modern standard for secure delegated access. Astrix Security published that finding in its State of MCP Server Security 2025 report, based on analysis of more than 5,200 open-source MCP server implementations. Rock Lambros at RockCyber Musings called it the most-cited number in the space in March 2026. The figure has since become the baseline statistic for agent security conversations.

The specific repos sampled, the methodology for determining authentication method, and the date of the analysis are not publicly documented in the secondary coverage that cites it. For a statistic that appears in nearly every article about MCP security, this is a thin foundation.

There is a plausible explanation for the gap. Open-source MCP servers are often hobby projects or early-stage tools. Enterprise deployments behind corporate firewalls may have substantially better security practices—or substantially worse ones, with no public data either way. Astrix Security has incentives to publish striking numbers. That is not the same as those numbers being wrong, but it means the 53 percent figure should be treated as a signal worth investigating rather than a fact worth citing.

The accountability gap that both Uber's architecture and the Astrix statistic point to is real. As agents move from proof-of-concept to production, as they handle more consequential tasks across more complex chains, the question of whether you can trace an action to a specific human through a specific agent chain moves from an audit inconvenience to a compliance requirement. The EU AI Act's requirements for high-risk AI systems include logging and traceability. Regulated industries—finance, healthcare, critical infrastructure—face versions of this requirement now.

What to watch next is whether the IETF WIMSE draft advances fast enough to matter. It expires in September 2026. Uber solved the problem for its own infrastructure. The broader question—whether the industry is building agent systems faster than it can secure them—is still open.

Research verification note: The Astrix Security methodology and sampling frame could not be independently confirmed before publication deadline. The 53% figure is reported as cited in secondary sources including RockCyber Musings and the IETF draft-ni-wimse-ai-agent-identity-02. Uber's architecture is based on the company's May 21, 2026 engineering blog post. The IETF WIMSE draft remains in progress as of publication.

Who Verified the Number Everybody Cites on AI Agent Security? — type0 | type0

Who Verified the Number Everybody Cites on AI Agent Security?

The accountability gap

What the architecture actually does

The gap behind the headline number

Sources