How SW and HW Vulnerabilities Can Complement LLM-Specific Algorithmic Attacks (UT Austin, Intel et al.)
Guardrails can stop a jailbreak prompt. They cannot stop a Rowhammer attack
Security research on large language models has a blind spot. For years, the field has focused on algorithmic attacks — prompt injections, jailbreaks, membership inference, model extraction — vulnerabilities that live inside the model itself. A paper published this month by researchers at the University of Texas at Austin and Intel argues that this focus is dangerously incomplete. The real prize for an attacker is not the model. It is the pipeline.
The paper, titled "Cascade," studies what happens when you combine traditional software and hardware vulnerabilities with LLM-specific attacks inside compound AI systems — the kind of multi-component pipelines that power production applications like Microsoft Copilot, GitHub Copilot, and enterprise RAG systems. These pipelines do not just contain a language model. They include a query enhancer, a knowledge database, an agent that orchestrates software tools, and a guardrail model that screens outputs for safety. Each layer is a separate software stack running on distributed hardware. Each layer is a separate target.
The researchers demonstrate two attacks. The first bypasses a guardrail by exploiting a code injection flaw in the query enhancer and a Rowhammer bit-flip attack against the guardrail model itself. Rowhammer is a hardware attack — repeatedly accessing DRAM rows to induce bit flips in adjacent memory cells. It is a technique that predates LLMs by a decade. Here, it is used to flip a safety decision in the guardrail model, allowing a jailbreak prompt to reach the underlying LLM unaltered. The guardrail never sees the attack because the attack bypasses it at the hardware level.
The second attack is simpler and more concrete. The researchers manipulate a knowledge database — the RAG component common in enterprise deployments — to redirect an LLM agent into transmitting sensitive user data to a malicious application. The model is not fooled by a prompt. The database is compromised, and the agent follows its instructions.
The paper introduces a Cascade Red Teaming Framework — a systematization of attack primitives across three layers: algorithmic (prompt injection, jailbreaks), software (code injection, SQL injection, malicious packages), and hardware (Rowhammer, timing attacks, power side channels). The key insight is compositional. Attacks at different layers can be chained. A software vulnerability can grant the access needed for a hardware attack. A hardware attack can disable a guardrail that would otherwise block a prompt injection. Single-layer defenses assume these layers are independent. They are not.
The implications for frontier labs and cloud AI providers are significant. Companies like OpenAI, Anthropic, and Google run compound AI pipelines at scale. Their safety research focuses heavily on algorithmic alignment and red-teaming at the model layer. Cascade shows that a motivated attacker with access to the software stack or the underlying hardware can circumvent those safety measures without touching the model at all. The authors note that hardware attacks like Rowhammer are inherently more difficult to mitigate because the underlying vulnerabilities lie outside the scope of algorithmic defenses — and can persist across model retraining.
The paper is a reminder that AI safety is not purely a model problem. The infrastructure that surrounds the model — the databases, the orchestration frameworks, the hardware — is part of the attack surface. Securing it requires looking beyond alignment research.
The paper is Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems, by Sarbartha Banerjee, Prateek Sahu, Anjo Vahldiek-Oberwagner, Jose Sanchez Vicarte, and Mohit Tiwari, submitted to arXiv March 12, 2026.