OpenAI built a math AI that couldn't say what it had proven

OpenAI built a math AI that couldn't say what it had proven — type0 | type0

PREVIEWOpenAI built a math AI that couldn't say what it had proven · MD

OpenAI built a math AI that couldn't say what it had proven.

That is the story behind the May 20 announcement that an OpenAI reasoning model had disproved the Erdős planar unit distance conjecture, a geometry problem posed in 1946. The model produced a proof and an infinite family of counterexamples. What it could not produce was the explicit numerical bound another mathematician could actually use. Princeton mathematician Will Sawin supplied that number the same day: delta equals 0.014.

The gap between what the AI generated and what a human had to supply is the real story.

OpenAI said its model had disproved the conjecture, providing an infinite family of examples that yield a polynomial improvement over prior results. The math community validated the core result. Tim Gowers, a Fields Medalist at the University of Cambridge, called it a milestone in AI mathematics. The proof — which uses specialized techniques from algebraic number theory, including Golod-Shafarevich theory and infinite class field towers (both standard but advanced tools in number theory) — would merit publication in a top mathematical journal even if a human had produced it alone, according to Scientific American.

OpenAI's version gave only an inexplicit exponent greater than 1. Buried deep in the OpenAI writeup, a footnote states the result was "not made explicit" — the model produced a bound of the form n^(1+δ) for some δ > 0, without supplying a number. The main text of the blog post does not flag this as a limitation requiring external correction. Whether OpenAI's researchers examined the gap before publication, or discovered it only in post-publication review, is not answered by any of OpenAI's published materials. No public statement from any OpenAI researcher has explained why the model could not extract an explicit value — the absence of such a statement is itself part of the record.

The distinction between the original chain-of-thought reasoning the model used and the rewritten version OpenAI published matters for evaluating autonomy claims. OpenAI's own writeup describes the process as follows: the model's output went to an AI grading pipeline that reported high confidence, and only then did internal human researchers begin examining the result. The manuscript that followed is described as "a human-edited exposition of the autonomously produced solution, with references, reorganized proofs, and additional explanatory material added afterward." The chain-of-thought document OpenAI published is labeled a rewritten version — the original reasoning the model used to arrive at the proof is not available. What the rewrite obscured, and whether the model's original reasoning path illuminated or obscured the inexplicit exponent gap, cannot be determined from the public record.

This pattern — a proof-shaped output that cannot supply the explicit constants another researcher needs — may represent a recurring failure mode in AI mathematical research. Number theorists are already using the AI-generated proof as a starting point, extending the approach by hand to reach results the model could not state explicitly. Whether this limitation is unique to this result or endemic to current reasoning models remains an open question.

The result also arrived with a credibility discount. Mathematician Thomas Bloom at the University of Bristol called an earlier OpenAI claim about the Erdős problem "a dramatic misrepresentation" in October 2025. When Melanie Matchett Wood worked through the AI's counterexample herself, she said that if experts had spent the same time seeking a counterexample from the start rather than parsing the AI's output, they would have found one.

Daniel Litt was blunter. Writing in Scientific American, he called this "the unique interesting result produced autonomously by AI so far." The qualifier is doing work.

The proof manuscript OpenAI released presents its main result as "there exists an absolute constant δ > 0" — it does not state an explicit numerical value. The verification team working with OpenAI — Lijie Chen, Mark Sellke, and Mehtaab Sawhney — estimated the value of δ obtained would likely be very small, later estimating δ at approximately 6 × 10⁻³⁸. A separate mathematical team then produced the explicit δ equals 0.014 now in circulation. Sawin's preprint, posted to arXiv the same day, supplied delta equals 0.014 — the first explicit numerical statement in the public record. His preprint and OpenAI's blog appeared on May 20; neither source references the other.

The delta matters because mathematics runs on explicit answers, not proof-shaped objects. OpenAI found one. It just could not read it.

OpenAI built a math AI that couldn't say what it had proven

Sources