Auto mode is Anthropic's answer to permission fatigue: a secondary classifier running Claude Sonnet 4.6 that reviews every tool call before it executes, making allow/deny decisions on the user's behalf. It sits above the existing blocklist, below user-defined allow/deny rules, and auto-approves read-only operations and file edits in the working directory. For everything else, it's the classifier or nothing. Anthropic's documentation describes it as "safeguards monitoring actions before they run."
On entering auto mode, Claude Code first strips any existing allow rules known to grant arbitrary code execution — blanket shell access, wildcarded script interpreters, package-manager run commands — before the classifier sees a single action. Subagents get checked twice: the classifier evaluates the delegated task at spawn time, then reviews the full action history when the subagent returns. The docs note that tool results never reach the classifier, so hostile content in a file cannot directly manipulate it.
Simon Willison, who writes extensively about AI and security, remains unconvinced: "I remain unconvinced by prompt injection protections that rely on AI, since they are non-deterministic by nature." He prefers deterministic sandboxing — restrictions on file access and network connections that don't depend on model judgment. "I still want my coding agents to run in a robust sandbox by default," he wrote, "one that restricts file access and network connections in a deterministic way."
The same morning auto mode shipped, The Hacker News reported that TeamPCP had published backdoored versions of LiteLLM, a popular Python library with 95 million monthly downloads. The compromise stemmed from the package's use of Trivy in its CI/CD pipeline. The backdoor — a malicious .pth file — was live for hours before the packages were yanked.
Here's where the allow list matters: auto mode's default rules include a "Declared Dependencies" carve-out that permits pip install -r requirements.txt provided the manifest hasn't been modified in session. The attack works if Claude modifies requirements.txt first — adds a typosquatted package — then runs pip install against it. If that modification happened before entering auto mode, the classifier has no record of it. The attack is invisible to the guardrail.
Gal Nagli, a researcher at Wiz, called it plainly: "the open source supply chain is collapsing in on itself." He's not wrong. LiteLLM is infrastructure — it's the kind of dependency that sits in hundreds of production services without anyone auditing its requirements.txt line by line.
Simon flagged this gap in his post on auto mode's announcement. The default allow list explicitly permits package installs from declared manifests. Auto mode's classifier would review pip install requests but not pip install -r requirements.txt under normal conditions. This is a narrow carve-out, not a design oversight — Anthropic knows that blocking pip install entirely would make Claude Code unusable. But it's exactly the gap a typosquatting attack needs.
Anthropic's own docs acknowledge the classifier's limits. "The classifier may still allow some risky actions: for example, if user intent is ambiguous, or if Claude does not have enough context about your environment to know an action might create additional risk," the auto mode documentation states. The company recommends running auto mode in isolated environments — not production.
David Gewirtz at ZDNET characterized it less generously: "auto mode feels like taking away the guardrails while putting up a sign along the edge of the road that says steep cliff."
The commercial pressure is real. Claude Code made $1 billion in revenue in its first six months, per ZDNET's reporting. Permission prompts are friction. At scale, they become a product problem — and Anthropic has a financial incentive to reduce them. Auto mode is the answer: probabilistic safeguards that are good enough to catch the obvious cases while letting the workflow flow.
It's available now as a research preview on the Team plan, with Enterprise and API access rolling out in the coming days. Anthropic knows the classifier is imperfect — that's what "research preview" means. But the product pressure isn't waiting for the classifier to get better. At $1B in six months, the tolerance for friction is exhausted.
Simon's right that non-deterministic guardrails are a different kind of trust than deterministic sandboxing. Whether that trade-off is the right call depends entirely on what Claude Code is running against — and in an ecosystem where a CI/CD compromise can push a backdoor to 95 million monthly downloads before lunch, the threat model isn't theoretical.
What to watch: whether the classifier model gets updated independently of the main model (it runs on Sonnet 4.6 regardless of what's in the session), how Enterprise policy controls handle the transition for users already in a session, and whether the latency and token overhead becomes a visible constraint at scale. The "research preview" label buys Anthropic time. It doesn't change what the classifier is: a second opinion, not a guarantee.