Anthropic Walks Back Its Covert Sabotage of Claude Users

Anthropic Walks Back Its Covert Sabotage of Claude Users — type0 | type0

PREVIEWAnthropic Walks Back Its Covert Sabotage of Claude Users · MD

Anthropic built a hidden performance throttle into Claude that was designed to degrade the model's outputs for researchers it suspected of training competing AI systems. The mechanism was not a published policy or a visible refusal. It was a deliberate, user-invisible downgrade, and the research community found out. Now the lab has reversed course, according to WIRED's reporting on the company's on-the-record statement.

The throttle shipped alongside Claude Fable 5, the publicly released model Anthropic described alongside Claude Mythos 5 as part of a dual-track launch this week. Anthropic described the safeguard as aimed at "frontier LLM development" users and designed to detect and degrade distillation attempts — training a smaller AI model off Claude Fable 5's responses — rerouting those requests to a less capable AI model. WIRED characterized the underlying mechanism as "sabotage" and reported it would have covertly limited competitors from using Claude to build rival AI models. Claude already ships with separate, public safety guardrails that reroute cybersecurity, biology, and chemistry queries to a less capable model; the controversy concerns a second, hidden track reserved for suspected frontier competitors.

After the safeguard became public, AI researchers pushed back hard. Dean Ball, a senior fellow at the Foundation for American Innovation and a former White House AI adviser, wrote on X that the policy of degrading performance on ML research without disclosure was shockingly hostile, and that the "secret sabotage" policy undermines Anthropic's overall stance because it limits AI researchers from collaborating on AI safety. Will Brown, research lead at the open-source AI startup Prime Intellect, told WIRED the policy made Anthropic appear to be pulling "the ladder up behind them": the company was effectively telling the public it didn't trust anyone else to do AI research. Brown also noted the restrictions could have hindered the growing ecosystem of third-party evaluation firms that test frontier models for safety, performance, and reliability.

Anthropic's response, delivered to WIRED, was direct. "We're changing Fable 5's safeguards for frontier LLM development to make them visible. We made the wrong trade-off and we apologize for not getting the balance right," the company said, per WIRED. The behavioral change is concrete: users who trip the safeguard will now be told they are being refused or rerouted to a less capable model, rather than silently receiving degraded outputs.

The reversal closes one fight and opens another. Anthropic's underlying concern is on the record: its terms of service bar customers from using Claude to train directly competing models, and that rule is not being abandoned. The dispute was never about whether frontier labs should police competitive training. It was about how they do it, and specifically whether they can quietly degrade the outputs of developers they suspect of crossing the line, without telling them. For one release cycle, at least, the answer from the research community was no.

The model name "Claude Fable 5" is confirmed as Anthropic's actual published naming, per WIRED's separate launch reporting. What is on the record is the mechanism, the apology, and the shift from covert degradation to visible refusal. The next question is whether visible refusal is enough, or whether researchers will press for the safeguard to be removed entirely.

Anthropic Walks Back Its Covert Sabotage of Claude Users

Sources