The Mathematician Who Submitted This Problem Couldn't Solve It. GPT-5 Pro Did.
A seven-year-old conjecture from a hypergraph theory paper defeated its own author repeatedly — until an AI found the construction he'd suspected might work but couldn't make rigorous.

image from FLUX 2.0 Pro
For the first time, an AI system has solved a problem on FrontierMath: Open Problems — Epoch AI's benchmark of genuine unsolved research mathematics — and the mathematician who contributed the problem is now planning to publish the solution in a journal.
According to Epoch AI's announcement, the problem was a conjecture from a 2019 paper co-authored by Will Brian and Paul Larson. Brian, who submitted the problem to the benchmark and rated it "Moderately Interesting," had tried and failed to resolve it multiple times since the paper was published. Kevin Barreto and Liam Price first elicited the solution from GPT-5.4 Pro; Brian confirmed it was correct. They have the option to be listed as co-authors on any resulting publication.
The Problem Itself
The problem falls in Ramsey theory — a branch of combinatorics concerned with the conditions under which order necessarily appears in large structures. Specifically, it involves a function H(n), which measures the maximum size of a hypergraph (a generalization of a graph where edges can connect more than two vertices) that has no isolated vertices and no partitions of size greater than n.
The best-known lower bounds for H(n) were, according to Epoch AI's problem page, believed to be suboptimal — even asymptotically. The challenge was to find a new hypergraph construction that improves those bounds by a constant factor.
The AI didn't just confirm an existing approach — it found the construction. Brian's response, quoted in full on the problem page, is worth reading:
"This is an exciting solution to a problem I find very interesting. I had previously wondered if the AI's approach might be possible, but it seemed hard to work out. Now I see that it works out perfectly. It eliminates an inefficiency in our lower-bound construction and in some sense mirrors the intricacy of our upper-bound construction. The matching lower and upper bounds are quite good for Ramsey-theoretic problems, and I'm interested in further understanding why this works out so well."
That last sentence — "I'm interested in further understanding why this works out so well" — is a mathematician saying that the AI's solution has opened new questions for him. Not just closed one.
What the Benchmark Is and Why This Matters
FrontierMath: Open Problems launched in late January 2026 with 14 problems, all contributed by professional mathematicians who rated them by significance and described their own prior attempts to solve them. Epoch AI's launch post explains the benchmark's design philosophy: problems are not selected to be hard for AI specifically; they are problems the contributing mathematicians want solved on mathematical grounds. Whether AI can solve them is an empirical question the benchmark is meant to answer.
That design choice matters for interpretation. This isn't a problem engineered to be a capability showcase. Brian submitted it because he genuinely wanted it solved and thought it was worth solving. The AI found the answer.
The benchmark distinguishes between problems rated "Moderately Interesting" — publishable in a specialty journal, likely to generate new questions — up to "Major Breakthroughs." Brian's problem sits in the lower tier. That's the appropriate level of calibration. This is not a Fields Medal result. It is a genuine, peer-confirmed, publication-worthy advance in a real subfield.
Reproducibility Across Models
After Barreto and Price's initial elicitation, Epoch AI tested the problem in their evaluation scaffold and found that three other frontier models can solve it as well: Gemini 3.1 Pro, GPT-5.4 (xhigh), and Opus 4.6 (max). The full transcript of GPT-5.4 Pro's original solution and a PDF write-up of the solution are publicly available.
That reproducibility across model families is significant. This isn't a one-shot result that might be a hallucination that happened to fool a tired reviewer. Multiple independent models, tested in a controlled scaffold, can derive the same result. Brian confirmed it.
According to Epoch AI's benchmarks page, GPT-5.4 Pro already holds a record on the original FrontierMath benchmark — 50% on Tiers 1–3 and 38% on Tier 4 — which consists of extremely hard but human-solvable problems designed for top experts. FrontierMath: Open Problems is a harder category entirely: problems no human has solved.
The Coauthorship Question
One detail in the announcement is easy to skim past but shouldn't be. Barreto and Price — the two people who figured out how to prompt GPT-5.4 Pro into producing the solution — have the option to be listed as co-authors on Brian's publication, alongside Brian himself.
That's a real question the math community is going to have to work through. The AI generated the mathematical content; two humans designed the elicitation; one mathematician verified and will write it up. Authorship norms in mathematics have been relatively stable for a long time. They weren't designed for this.
Analysis: The significance here isn't that AI has "solved mathematics." FrontierMath: Open Problems has 14 problems. One has been solved, in the lowest significance tier, after two months. The other 13 remain open. What this result does establish is that the transition from "AI solves problems humans can solve" to "AI solves problems humans haven't solved" has already begun — it's not a future event. The calibration question now is how often, at what significance tier, and whether the rate accelerates. Epoch AI built the benchmark specifically to make that question measurable. Today's announcement is the first data point.

