Fable 5's safety net is catching the welcome mat

PREVIEWFable 5's safety net is catching the welcome mat · MD

The simplest test of a chatbot is the first one: type "hello," and see what comes back. On Anthropic's newly released Fable 5, the answer was nothing. Mike Famulare, a researcher at the Institute for Disease Modeling at the Gates Foundation, reported the failure on GitHub earlier this week. His session, by his account, contained no repository content, no tool calls, and no file reads in context. The trigger was the greeting itself.

Famulare is one of several researchers to flag the pattern within days of Fable 5's launch. Immunologist Derya Unutmaz, a professor at the Jackson Laboratory for Genomic Medicine, posted a similar complaint on X, and GitHub issues #66587, #66655, #66657, and #67062 catalog the same behavior: innocuous prompts returning empty replies, no rationale, no path forward. The pattern is reproducible. The model is not designed to refuse the word "hello." It is designed to refuse things that look risky. The classifier is too tight.

Anthropic acknowledged the over-refusal in remarks to The Register's Thomas Claburn, attributing it to conservatively tuned guardrails. The company placed the false-positive rate at "less than 5 percent" of sessions and said it would "reduce false positives as quickly as we can." Anthropic did not respond when asked to validate the figure against its own telemetry. A vendor-stated five percent is a statement, not a measurement, and the difference matters at the scale Anthropic is operating.

Claburn's reporting puts Claude's user base somewhere between 18 and 30 million. Even the lower end of that range, multiplied by Anthropic's own admission, produces hundreds of thousands of blocked innocuous interactions per day. Most are nuisance events. A few are not. A researcher whose session is silently truncated mid-analysis loses an hour of context. A developer whose debug prompt is refused loses an evening. An enterprise customer whose API returns nothing loses a transaction. A small percentage of twenty million is a fleet of paper cuts.

The more interesting finding, and the one less likely to be fixed by a classifier patch, is what Fable 5 does after it refuses. In several of the reported cases, the model did not simply stop. It substituted itself with Opus 4.8, the previous generation, and continued the session without telling the user. The fallback works. The silence does not. A user who believes they are talking to Fable 5 may be making decisions based on Opus 4.8 outputs and never know it. For researchers running evaluations, and for enterprise customers with audit obligations, that is not a calibration problem. It is a transparency problem.

Calibration can be retuned. Anthropic's safety team can broaden the training distribution, raise the refusal threshold, and ship a regression suite. The company says it is doing exactly that. Transparency is harder. Silent model substitution is a deliberate engineering choice, and unwinding it requires logging the fallback, labeling the active model in the response, and giving enterprise users a way to opt out. The under-five-percent figure is a placeholder. The next published number, paired with a per-category breakdown, a before-and-after on the silent fallback, and a roadmap for opt-outs, is the one that matters.

The goal of safety tuning is not in dispute. False-positive refusals are the expected cost of a classifier that errs on the side of refusing rather than emitting something harmful, and most of the field would rather pay that cost than the alternative. The dispute is about execution, calibration, and the specific choice to keep the fallback invisible. Anthropic has a week, maybe two, to publish the next number. The researchers tracking this are not going to stop filing issues while they wait.

Fable 5's safety net is catching the welcome mat — type0 | type0

Fable 5's safety net is catching the welcome mat

Sources