Anthropic's Claude Fable 5 hits 88% on the hardest AI math benchmark, up from below 10% months ago

Anthropic's Claude Fable 5 hits 88% on the hardest AI math benchmark, up from below 10% months ago — type0 | type0

PREVIEWAnthropic's Claude Fable 5 hits 88% on the hardest AI math benchmark, up from below 10% months ago · MD

Six months ago, Anthropic's best model cracked fewer than one in ten of the hardest problems on FrontierMath, an expert-authored math benchmark widely cited as one of the toughest tests of AI reasoning. On Friday, the company reported that its newest model, Claude Fable 5, hit 88% on the same tier. The acceleration, not the leaderboard, is the story.

FrontierMath, operated by Epoch AI, is a curated set of problems written by working mathematicians and designed to resist the pattern-matching that carried earlier large language models. Its "tier 4" problems are the hardest subset; the current variant is labeled v2, and scores are not directly comparable across versions. In early 2026, Anthropic's Opus 4.5 scored below 10% on tier 4. By June 13, 2026, Claude Fable 5 reportedly reached 87% on tiers 1–3 and 88% on tier 4 on the same test.

That puts Fable 5 about 13 points ahead of OpenAI's GPT-5.5, which the-decoder reports reaches roughly 75% on tier 4. GPT-5.6 is reportedly in development at OpenAI, though the company has not confirmed timing. All comparisons used Epoch AI's standard scaffold with maximum reasoning effort, which is load-bearing context: this is benchmark performance with heavy scaffolding support, not autonomous mathematical reasoning.

A few caveats belong near the top of any honest reading. FrontierMath is curated, and tier 4 is its most recent variant, so gains there can reflect scaffold and reasoning-effort changes as well as model capability. Epoch AI is the primary source for the figures cited; the upstream reporting is a single outlet (the-decoder) summarizing Epoch AI's leaderboard. Anthropic has not, as of this writing, published its own confirmation of the "Fable 5" naming or the 88% figure, and the model name is unusual; the more familiar Anthropic family in this period includes Claude Opus 4.5 and its successors.

The harder question is what tier 4 still measures once multiple frontier models clear it. A benchmark designed to be unsolvable becomes, by construction, less useful the moment it is solved. If the curve from 10% to 88% in roughly six months holds across the next cycle, the next move belongs to the benchmark operator: harder problems, new problem families, or some acknowledgment that the leaderboard has stopped ranking capability in the way it once did.

There is a separate, real-world beat worth watching. Recent reporting has suggested that both an OpenAI model and an Anthropic model referred to as "Claude Mythos" have tackled a longstanding Erdős problem, a class of open conjectures in discrete mathematics named for the prolific twentieth-century mathematician Paul Erdős. The "solved" framing in the source material needs primary-source verification, including which problems, what kind of solution, and on whose authority, before any claim of mathematical breakthrough can stand. Until then, it is a depth beat, not a fact.

What to watch next: Epoch AI's next leaderboard update, any Anthropic confirmation of the Fable 5 release, and OpenAI's first public numbers for GPT-5.6. The slope of the curve is the story; the names on the leaderboard will keep changing.

Anthropic's Claude Fable 5 hits 88% on the hardest AI math benchmark, up from below 10% months ago

Sources