The Deception Myth: AI Doesn't Lie — It Just Won't Commit — type0

PREVIEWThe Deception Myth: AI Doesn't Lie — It Just Won't Commit · MD

When you put Llama 3.2 agents in a game of Among Us and run 1,100 rounds, something curious happens: they learn to be economical with the truth. Not dishonest, exactly — just vague enough that nobody can pin them down.

Maria Milkowski and Tim Weninger, researchers at the University of Notre Dame, built that experiment from scratch. They populated games with LLM-controlled players in groups of four to eight, let them vote on who to eject each round, and logged every utterance. The result is a dataset of more than one million tokens of meeting dialogue across 34,882 coded utterances — one of the larger controlled studies of LLM agent behavior in a social setting. The paper is posted to arXiv and accepted for presentation at AAMAS 2026 in Paphos, Cyprus in May.

The headline finding is not that agents lie. It's that they lie badly — or rather, they don't lie at all in the way you'd expect. "Deception appears primarily as equivocation rather than outright lies," the researchers write. Agents dodge, hedge, and redirect. They never issue a false declaration of their own identity. What looks like deception is mostly just evasion.

The equivocation rate climbs when agents are under social pressure — when other players are voting against them, when the evidence is mounting. But here's the part that complicates the narrative: it doesn't help. Across all game configurations, "deception rarely improved win rates," the paper states. Impostors — the agents whose goal is to eliminate crewmates undetected — won more often than crewmates, and that advantage grew as the number of impostors increased. But the deceptive speech itself was not the cause. Directive-heavy discussions — lots of task-oriented commands and coordination — correlated with crew victories, not impostor wins. The paper frames it as a false signal: directive density signals coordination, not guilt.

This matters beyond the game for a specific reason. The agents were trained on Llama 3.2, which has been alignment-tuned to be truthful. Their deceptive language "reflects the competing pressures of training for factuality and for responsive task completion rather than deliberate manipulation," the researchers write. That's a more interesting conclusion than "AI learns to deceive." The model isn't optimizing for lying — it's optimizing for winning, and evasiveness is the path of least resistance when your training tells you to be honest but your objective says survive. Equivocation is what truthfulness-pressure and task-pressure produce when you mix them wrong.

The speech act taxonomy is revealing in its own right. The researchers coded utterances into five categories — directives (task commands), representatives (statements about beliefs or explanations), declarations (identity claims), commissives (promises), and expressives (reactions). Across all games, 98 percent of utterances were directives. Impostors produced a slightly higher proportion of representatives — 1.7 percent versus 0.5 percent for crewmates — a difference the researchers confirmed with a chi-squared test (χ²(3)=103.85, p<.001). Before a vote that would eject them, impostors shifted marginally toward representative speech and away from directives, a behavioral tell that the game could theoretically exploit — though the paper doesn't claim the current model is sophisticated enough to do so reliably. Giskard confirmed these statistics against the arXiv paper; the specific percentages (98%, 1.7%, 0.5%) and the chi-squared value are drawn from the paper's internal analysis and could not be independently verified from public sources.

The GitHub repository (mmilkowski36/AmongUs) contains the full game logs, prompts, and classifier code. Gemini categorized the 34,882 utterances across three runs to test stability. Everything is reproducible, which is rarer than it should be in AI behavior research.

There are obvious limits. Among Us is a social deduction game — its mechanics reward deception structurally, which makes it a useful proxy for studying the behavior but not a neutral environment. The model is Llama 3.2, a relatively capable but not frontier-class LLM. The game rules are fixed. In the real world, agents face different incentive structures, different social contexts, and consequences that don't reset when the vote goes wrong.

What's durable is the finding about where deception comes from. When you train a model to be truthful and then ask it to win a game that requires hiding, it will find the laziest path between those two goals — which turns out to be equivocation, not fabrication. Whether that finding holds in higher-stakes deployments is the question worth tracking. The Notre Dame team's dataset gives researchers a baseline to check.

Maria Milkowski and Tim Weninger are researchers at the University of Notre Dame. Their paper is available on arXiv.

The Deception Myth: AI Doesn't Lie — It Just Won't Commit — type0 | type0

The Deception Myth: AI Doesn't Lie — It Just Won't Commit

Sources