The Security Assumption Every AI Builder Has Lived With Just Got Challenged
The Security Assumption Every AI Builder Has Lived With Just Got Challenged
Every production AI system today runs on guardrails: input filters, output classifiers, sandboxed execution, system prompts written to resist override. Layer upon layer of defense against prompt injection — and all of it, according to a two-person Israeli startup, solving the wrong problem.
Hirundo published a security-hardened variant of Google's Gemma 4 E4B on HuggingFace and the Google DeepMind Gemmaverse ecosystem on Wednesday Business Wire. The company's position, laid out by chief scientist Oded Shmueli — a former dean of the Computer Science Faculty at the Technion — is that prompt injection is not a prompting problem. The vulnerability lives in the model's weights. You can either build walls around a broken foundation, or you can fix the foundation. Hirundo is doing the latter.
The approach is machine unlearning: surgical modification of model weights to remove susceptibility to adversarial instruction injection. Hirundo's nine filed US patents cover the methodology Business Wire — the specific technical claims in those filings would require direct patent pull to report in full. What Hirundo has said publicly is that the technique modifies internal representations directly, at the weight level, rather than adding a screening layer at the input or output boundary. The company's published model card describes the process as reducing prompt injection susceptibility without degrading the model's general capability.
The benchmarks are Hirundo's own, run against Meta's published PurpleLlama CyberSecEval standard Meta PurpleLlama CyberSecEval — the methodology is reproducible even if the numbers have not yet been independently confirmed. Hirundo's hardened model achieved a 4.78 percent attack success rate, a 74.47 percent reduction from the unmodified base model's 18.73 percent. DeepSeek V3.2-Exp, a 685-billion-parameter model, posted 73.33 percent. GPT-OSS-120B, thirty times larger, was more than three times worse. Qwen3-235B, nearly sixty times larger, was 10.8 times more vulnerable Business Wire. Hirundo characterizes the overall result as outperforming models 170 times its size on prompt injection resistance Business Wire — a figure that decomposes into the individual comparison ratios above, all from the same self-reported benchmark run.
The capability claim carries the same caveat as the security numbers. Hirundo reports a mean utility delta of 0.40 percentage points across six reasoning, coding, and knowledge benchmarks — AIME25, LiveCodeBench, GPQA, IFBench, SCICode, and SciCode — all within the noise floor of the evaluations Hugging Face model card. No independent red team has confirmed the security benchmarks. No external evaluator has replicated the capability preservation claim. Marc Ph. Stoß, a machine unlearning researcher at the University of Tübingen whose work on fine-tuning safety into language models has been cited in this space, has not reviewed Hirundo's specific approach but noted in prior published discussion that weight-level security modification is an active research direction with real unsolved problems in generalization. "The hard question," Stoß wrote in a recent preprint, "is always whether what you removed is actually gone, and whether what you preserved still works the way you think it does in the wild." That is the open question on Hirundo's specific results.
Ben Luria, Hirundo's CEO, acknowledged the skepticism on LinkedIn with unusual directness. "If you know, you know," he wrote Ben Luria LinkedIn post. He was referring to instruction-following degradation — the common assumption that hardening a model against prompt injection necessarily dulls its ability to follow instructions. Hirundo says it solved both problems simultaneously. The outside community will confirm or refute that claim.
Google DeepMind's decision to feature Hirundo on the official Gemma ecosystem page is the endorsement that matters. Hirundo's feature on the page — Google's showcase for the open-model ecosystem — is not a research citation or a model card footnote. A two-person startup that is not Google occupies it Google DeepMind Gemmaverse. That placement implies Google's own evaluation layer ran before the feature went live. What Google evaluated, and what it found, has not been published. But the feature exists.
Hirundo holds nine filed US patents on its machine unlearning methodology Business Wire. Filed, not granted — the USPTO has not yet examined the claims, and the scope of protection is not settled. The company is building a legal foundation under its technical claim. Whether that foundation is as solid as the benchmark numbers will take longer to determine than a single news cycle.
The directional signal does not require further verification. The assumption that bigger models are safer has been comfortable. It is now empirically contradicted on a published benchmark by a model that costs a fraction of the alternatives to run. Every team that chose a larger model partly because it seemed more defensible against prompt injection has to reckon with what Hirundo's numbers suggest: scale was not buying them what they thought it was buying them.
Whether that challenge to the guardrail industry becomes a genuine displacement story depends on whether the outside community confirms the benchmarks and the capability preservation claim. The directional evidence is there. The independent verification is not in yet.