When a company builds an AI pipeline that routes a task across multiple AI agents, each one looks clean to security tooling. A document reader looks clean. A compliance checker looks clean. A summarizer looks clean. But a paper from researchers at the University of Central Florida, published to arXiv on April 17 and accepted to the ACL 2026 conference, shows that the composition itself can be weaponized. The research demonstrates a class of attacks called conjunctive prompt attacks, where a trigger buried in a user's query and a hidden template embedded in one remote agent activate simultaneously only when routing brings them together. No single component appears malicious. No standard defense catches it.
The practical implications arrived before the academic framing did. In March 2026, Mercor disclosed a cyberattack tied to a compromise of the open-source LiteLLM project, which proxies requests across multiple language model providers. The attack path in that breach matches the structural vulnerability the UCF team describes: a compromised intermediate layer made each downstream request look legitimate. What researchers are now quantifying is how general this pattern is, and how little existing tooling does to stop it.
The UCF team tested their attack across five language models: Gemma-2B, Mistral-7B, LLaMA-3-8B, Llama-4-Scout-17B-16E-Instruct, and GPT-5-mini. They ran it across three common agent communication structures: a central client routing to specialized agents, sequential agent chains, and branching directed-acyclic graphs, where an agent can route output to multiple downstream agents without cycles. In every topology, routing-aware optimization, which tunes where trigger keys are placed and how routing bias is set, substantially increased attack success while keeping false activations low. The attack does not modify model weights or client code. It works purely at the prompt level.
Existing defenses fail for a structural reason. PromptGuard and Llama-Guard, the two most widely deployed prompt-injection defenses, evaluate individual components. They see a trigger key in isolation, which looks like an ordinary user query, and a hidden template in isolation, which looks like a standard system instruction. Neither flags the combination. Tool restrictions, which block agents from calling sensitive functions, address the consequences of a successful attack but not the attack itself. "No single component appears malicious in isolation," the researchers write, "and existing defenses do not reliably stop the attack because defense failure is structural." The issue appears in the OWASP Top 10 for Agentic Applications 2026 as a known risk category for multi-agent systems.
The attack surface is not exotic. Multi-agent LLM systems are the default architecture for enterprise AI deployments that route sensitive data, including legal documents, financial records, and customer communications, across specialized models. A client agent decomposes the task and routes segments to remote agents. One of those remote agents may be an external API, a third-party tool, or a hosted endpoint the company does not fully control. The compromise does not need to touch the company's own models or code. It needs a foothold in one node of the routing graph.
The researchers have released their attack code on GitHub. The goal, they say, is to force the field to build defenses that reason over routing and cross-agent composition, something no current commercial tooling does. For enterprises running multi-agent pipelines, the uncomfortable question is whether they already have compromised remote agents in their stacks, and whether they would know if they did.
What to watch: whether major model providers update their safety tooling to evaluate cross-agent composition rather than individual prompts, and whether the March Mercor incident accelerates enterprise security audits of agent routing graphs. The vulnerability is structural. The fix will need to be too.