Anthropic's Fable 5 takes the benchmark crown and doubles the price to wear it
A 5.7% Artificial Analysis gain over Opus 4.8 lands with token costs that have roughly doubled. Whether the premium pays off depends on the workload, not the leaderboard.
A 5.7% Artificial Analysis gain over Opus 4.8 lands with token costs that have roughly doubled. Whether the premium pays off depends on the workload, not the leaderboard.
Anthropic put Claude Fable 5 on the top of the Artificial Analysis Intelligence Index this month, and the company is asking customers to pay roughly twice what they paid for the model it replaces. The 5.7 percent gain over Claude Opus 4.8, measured across Artificial Analysis's ten-evaluation suite, is real. So is the doubled token bill, and that combination is the story, not the leaderboard crown.
According to Anthropic's June 9 launch post, Fable 5 is priced at $10 per million input tokens and $50 per million output tokens. Reporting from the-decoder on June 12 puts Opus 4.8's prior list at $5 and $25, which means the new per-token rates are a clean 2x on both sides. Anthropic also released a sibling model, Claude Mythos 5, the same underlying weights with cyberdefense-focused safeguards lifted, initially distributed through Project Glasswing to defenders.
The full-benchmark-run cost tells a similar story. Running the entire Artificial Analysis Intelligence Index at maximum reasoning on Fable 5 costs about $9,940, the-decoder reports, citing Artificial Analysis. Running the same suite on Opus 4.8 cost about $4,970. So the price-to-performance ratio, benchmark run cost divided by index points, has actually gotten worse: a buyer is now paying roughly $153 per index point on Fable 5, up from about $77 on Opus 4.8.
That gap is the practical question for anyone shipping product on top of Anthropic. A 5.7 percent lift on an aggregate index does not translate evenly into real workloads. For tasks that sit close to the benchmark's evaluation mix (multi-step reasoning, code synthesis, long-context retrieval), buyers will likely see something in that range. For tasks that are bottlenecked by output verbosity, latency, or domain-specific knowledge, the gap could be much smaller, or even negative once you account for the doubled output-token rate, which is the more expensive side of the API.
The pattern is not new. The the-decoder piece notes that Opus 4.7 to 4.8 followed the same shape, a small measured gain on Artificial Analysis paired with a step-change in pricing, and that Anthropic themselves called the 4.8 lift "modest but tangible" in their own release notes. Fable 5 is the third release in that sequence. If the trend holds, the next model will offer another incremental gain at another doubled step in cost, and the question for buyers shifts from "is the new model worth the price" to "at what aggregate cost does the marginal benchmark point stop paying for itself."
The closer gap matters too. Artificial Analysis has GPT-5.5 roughly five points behind Fable 5 on the same index, according to the-decoder's reading of the leaderboard. If that gap holds, Fable 5's per-index-point premium is being paid for a five-point lead over the strongest non-Anthropic option, not a commanding one. For workloads where the alternative is "good enough," the price difference compounds fast at production scale.
No benchmark suite fully captures real-world ability, and the Artificial Analysis Intelligence Index is no exception. Ten evaluations, weighted and combined, do not see what your users see. The honest version of the decision is: benchmark leadership is a price tag, not a recommendation. Whether that price tag is worth paying depends on the workload, the alternative models' scores on the same index, and how much of the bill is output tokens, which is the side that doubled.
What to watch next is whether enterprise customers route around the new pricing (longer caching, smaller context windows, lighter model selection for non-critical paths) or absorb it. Anthropic's framing of the prior gain as "modest but tangible" is the closest thing to a roadmap signal in the release, and the next model's pricing will tell readers whether 2x-per-release is the new normal or the high water mark.