OpenAI Solved an 80-Year-Old Math Problem. The Mathematicians Are Still Checking.
OpenAI says it solved an 80-year-old math problem. Mathematicians are checking.
The company announced in May that its reasoning model had produced an original proof disproving a geometry conjecture first posed by Paul Erdos in 1946. The mathematicians who flagged OpenAIs last major math claim — seven months ago, when GPT-5 was found to have rediscovered solutions that already existed in the literature — are vouching for this one.
That should settle it, except it does not. The broader data on AI math capability tells a different story: on a benchmark of ten research-level problems set by professional mathematicians, AI models correctly solved two. Twenty percent.
The two-for-ten record comes from First Proof, a project launched by eleven mathematicians including Fields Medal winner Martin Hairer and Mohammed Abouzaid of Stanford. The problems were unpublished, drawn from active research, and designed to require genuine mathematical creativity rather than pattern-matching to existing literature. AI systems produced confident proofs for all ten. Only two held up.
We did not expect the AI companies would take it this seriously and put this much labor into it, Abouzaid said.
The formal version of that test is happening now. First Proofs second batch began blind refereeing in late May; human mathematicians are rating each AI solution without knowing which system submitted it. Results are expected in June.
The process matters because there is no standard mechanism for validating AI math claims. Labs announce breakthroughs in blog posts. Mathematicians respond, sometimes years later, in papers. OpenAIs October 2025 episode — when former VP Kevin Weil posted that GPT-5 had solved ten open Erdos problems, only for Thomas Bloom of the Erdos Problems website to call it a dramatic misrepresentation — is the current precedent. Nothing caught it in advance.
They are good at scouring big lists of problems for low-hanging fruit, Terence Tao of UCLA said of current AI models. Scattered successes among a big sea of unreported failures.
The successes are real. Ernest Ryu, a mathematician at UCLA, used GPT-5 to prove that a method proposed by Yurii Nesterov in 1983 actually converges — a 42-year-old problem in optimization theory. It took twelve hours of human verification work and multiple rounds of AI-human dialogue. Ryu has since taken a leave of absence to join OpenAIs technical staff. Thang Luong of Google DeepMind has said he hopes AI and mathematicians could jointly win a Fields Medal by 2030.
Fields Medal winner Akshay Venkatesh is less sanguine. There are valuable things in our culture which we should try to keep, he told Quanta Magazine, without specifying what he means.
That question — what, exactly, needs protecting — is animating a broader values discussion in the mathematical community. Tao and co-author Tanya Klowden published a philosophical essay in March noting a forthcoming Leiden Declaration on the use of AI and formalization in mathematics, developed by mathematicians including those who have spent careers advancing the field standards.
The declaration is not yet published. The refereeing results are not yet in. What is in is a pattern: labs announce, mathematicians check, and the gap between announcement and verification remains a structural feature of how AI math claims reach the public.
OpenAIs May 20 proof is currently backed by independent mathematicians. Whether it survives blind review — and whether the field develops the infrastructure to catch the next false claim before it trends — is the question the June refereeing results should begin to answer.