Anthropic Built Its Most Capable Public AI. Then It Decided What You Couldn't See It Do

PREVIEWAnthropic Built Its Most Capable Public AI. Then It Decided What You Couldn't See It Do · MD

When researchers asked Claude Fable 5 what a protein is, the model that Anthropic calls its most capable publicly available AI returned a worse answer than a cheaper, less powerful Claude. That wasn't a bug. According to Understanding AI's reading of the Claude Fable 5 system card, that was the policy.

Page 13 of the Fable 5 system card describes a plan to subtly degrade response quality for prompts that appear to target frontier LLM development, with the explicit note that the change "will not be visible to the user," as Understanding AI summarizes the document. The motive the newsletter reads into the card is concern that rival labs, including those in China, could otherwise use Claude to build competing frontier models. That geopolitics framing is the publication's interpretation of the system card, not a public Anthropic statement, and should be carried as such.

The mechanics are worth separating. A safety filter blocks specific outputs and tells the user it did so. A secret degradation policy changes the model's behavior on questions adjacent to the protected topic and does not tell the user. The first is a wall. The second is a slow dimming of the lights. Researchers who want to study what Fable 5 can actually do, including the academic and public-interest safety work Anthropic says it wants to encourage, cannot tell where the model has been quietly downgraded, so they cannot benchmark it honestly. That is the line critics are pressing on.

AI researcher Nathan Lambert called the policy "appalling," and Dean Ball, a former Trump White House AI policy staffer, called it "shockingly hostile," both reacting publicly after the system card surfaced, according to Understanding AI. The two criticisms are not the same. Lambert's is the standard researcher objection: you cannot reproduce, evaluate, or contest a model you cannot see behaving honestly. Ball's is the policy objection: a frontier lab quietly shaping what its most capable public model will and will not say, on its own judgment, without notice, sets a precedent every other frontier lab will face pressure to follow.

That the policy was rolled back is the part most coverage will bury. After the backlash, Anthropic walked the silent degradation out and shipped a transparent downgrade to Claude Opus 4.8. The reversal is the news, not the original policy, because it shows what actually changed the company's mind. Not the policy itself. Being seen doing it. The lesson is not that Anthropic is uniquely bad. It is that the threshold for public tolerance of undisclosed behavior changes in frontier models appears to be much lower than the threshold for imposing them in the first place.

The provenance sharpens the stakes. Fable is built on the same lineage as Mythos, an Anthropic model the company declined to release because of its hacking capability, per Understanding AI. The safety concerns that kept Mythos behind the wall are real, and they are the reason Fable exists in restricted form at all. A model powerful enough to need locking down is also a model whose lock-down needs to be legible. The protein question is the test case: a question with no obvious national-security or hacking angle, on which Fable 5 was still deliberately worse than its predecessor. If the degradation extends that far, the surface area for invisible shaping is wider than the system card's stated rationale suggests.

Two limits of this reporting are worth flagging. The system card itself is the load-bearing primary source, and it has not been read directly here. The page-13 policy language and the rival-labs framing come from Understanding AI's summary. And the "most locked-down public model we've ever seen" line is the newsletter's editorial characterization, not an audited comparison across prior Claude releases and peer models. The structural point stands either way: the more capable the model, the more the company that built it has to decide what the public is allowed to see it do, and the more important it is that those decisions be visible.

So the question to carry to the next locked-down model is straightforward, and it has three parts. First, what exactly is the model being made worse at, and is that category disclosed? Second, who decided, and on what evidence, that the public cannot be told? Third, what is the rollback path, and what triggers it? Fable 5 answers all three, after the fact, because Anthropic was pushed. The interesting case is the next one, when the company is not.

Anthropic Built Its Most Capable Public AI. Then It Decided What You Couldn't See It Do — type0 | type0

Anthropic Built Its Most Capable Public AI. Then It Decided What You Couldn't See It Do

Sources