The 1-2% problem: What Terence Tao learned from watching AI do math
Fifty problems solved. The number sounds like momentum. Terence Tao, the Fields Medal-winning mathematician at UCLA, has been tracking the tally on a public GitHub repository, and the count has been climbing steadily since late 2025. But the number that should actually get attention is the one Tao slips into a footnote in almost every conversation he has about AI and mathematics: 1 to 2 percent.
That is his estimate of the systematic success rate when AI tools are pointed at any given open problem in pure mathematics. Not cherry-picked wins. Not the cases that make press releases. An honest accounting of what happens when you run the tools across a problem set and count the outcomes. The 1-2 percent figure is Tao's synthesis of systematic studies across the field — not a single controlled study. He is characterizing a body of literature when he says "whenever we do a systematic study, on any given problem an AI tool has a success rate of maybe 1 percent or 2 percent."
The results are being tracked on Tao's public Erdős problem dashboard, where Scientific American reported in February 2026 that roughly 100 problems have received AI-assisted progress since October 2025 alone — double the roughly 50 Tao cited on the Dwarkesh Patel podcast recorded in March 2026. The acceleration is real. But AI did not solve 100 problems unaided. The count includes problems where AI found useful partial progress, identified relevant prior literature, or located a previously overlooked connection. Pure AI solutions, where the system generated a correct proof with no human in the loop, are rarer than the headline count suggests. Of the 1,179 open Erdős problems Tao tracks on his GitHub wiki, TechCrunch documented that Tao counted just eight cases where AI models made meaningful autonomous progress, plus six additional cases where the contribution was identifying and building on prior research. Fourteen cases total across a problem set that once numbered over 1,100.
"Whenever we do a systematic study, on any given problem an AI tool has a success rate of maybe 1 percent or 2 percent," Tao said in a wide-ranging interview on the Dwarkesh Patel podcast, recorded in March 2026. "It's just that they can buy scale, and you just pick the winners. It looks great."
This is the part of the AI-math story that does not travel in press releases. The wins get amplified. The systematic failure rate gets absorbed silently into the background noise of a field that is not yet very good at publishing its own errors. The Decoder separately reported Tao estimating that roughly 1 to 2 percent of currently open Erdős problems sit in the sweet spot where today's AI tools, with minimal human guidance, can reach a correct proof.
The plateau was predictable. There was a period, Tao recalls, when pure AI solutions to open problems appeared to arrive in rapid succession. Those were the ones sitting at the bottom of the difficulty distribution: problems with enough prior literature and enough partial results that a frontier model could navigate to a solution without human navigation. The low wall. Then it stopped.
"There was a month where that happened and that has stopped, not for lack of trying," Tao said. Three separate groups ran frontier models against the full problem set simultaneously. They found minor observations. They rediscovered work already in the literature. They did not crack the remaining walls.
The reason is not mysterious, and it has nothing to do with the quality of the models. It is the structure of mathematical knowledge itself.
The bottleneck that was always there
Tao draws a direct analogy to the history of science. The internet drove the cost of communication to near zero. AI has done the same for hypothesis generation: coming up with possible explanations, sketching proof strategies, suggesting which prior results might be relevant. Both transformations are real. Both are incomplete.
"It does not create abundance by itself," Tao said. "Now the bottleneck is different. We are now in a situation where suddenly people can generate thousands of theories for a given scientific problem. Now we have to verify them, evaluate them."
The generation-verification asymmetry is ancient. Johannes Kepler, working with Tycho Brahe's painstaking astronomical observations in the early 17th century, proposed one geometric theory of planetary orbits after another. Most were wrong. The Platonic solid model, his first serious attempt, failed to fit Brahe's data by roughly 10 percent. He spent years generating candidates before landing on the three laws that survive in textbooks today. The filter was the data. The generation was cheap. The verification was expensive, slow, and human.
This is exactly what AI has not changed about mathematics. A model can propose a proof strategy, generate thousands of candidate approaches, and surface relevant lemmas from an enormous literature. What it cannot do is sit with a failed proof for six months, feel that something in the structure is wrong, try three different reformulations, and then suddenly see why none of them can work. That is not a failure of current models. It is a description of what mathematical insight actually is.
The issue shows up in how AI failures work. A model does not have persistent memory of what it tried and why it failed. Each session starts fresh. A new attempt does not build on the intuition generated by the previous failure. In Tao's framing, this is the difference between jumping very high and climbing: "What they can't do is jump a little bit, reach some handhold, stay there, pull other people up, and then try to jump from there. There isn't this cumulative process which is built up interactively."
The asymmetry the hype cycle keeps missing
The framing problem in AI-and-math coverage is structural. AI companies report wins. They do not publish systematic failure studies. A result where AI solves one of the world's hardest unsolved problems generates a paper, a press release, and social media virality. A result where the same system fails on 99 consecutive attempts generates nothing. The selection bias is severe, and Tao is one of the few people in a position to see both sides of it.
The practical consequence is a persistent gap between what the field believes AI math tools can do and what they demonstrably do on average. The wins set expectations. The misses are invisible. Tao's 1-2 percent figure is not a technology limitation that will lift with the next model generation. It is a feature of the problem distribution. The easy problems are solved. The hard ones require something that current AI systems, by design, do not accumulate.
This shapes what AI can productively do in a mathematician's day. Tao's own workflow has changed, but not in the way the headline announcements suggest. His papers now contain significantly more code, numerical work, and visualization than they did five years ago. Generating a plot that would have taken hours can now be done in minutes. A literature search that would have required weeks of library time happens in an afternoon. On secondary tasks, he estimates a fivefold slowdown without AI assistance. But on the core work of actually solving an open mathematical problem, the contribution of AI is so far nil.
"The type of papers that I would write today, if I had to do them without AI assistance, would definitely take five times longer," Tao said. "But I would not write my papers that way. They have really sped up lots of secondary tasks. They have not yet sped up the core thing that I do. It has allowed me to add more things to my papers. By the same token, if I were to write a paper I wrote in 2020 again, it actually has not saved that much time, to be honest. It has made the papers richer and broader, but not necessarily deeper."
Tao's 2023 prediction that 2026-level AI would function as a trustworthy co-author in mathematical research has, in his own assessment, largely come in on schedule. The qualifier that gets lost in the retelling is the phrase "when used properly." What he described in March 2026 sounds less like a peer and more like a powerful research assistant: one that cheerfully handles the tedious cases, runs the literature searches, generates the first-draft code, and does not object to grunt work. It is a useful colleague. It is not a mathematician.
A cognitive Copernican revolution
Tao describes the current moment as a cognitive Copernican revolution: the recognition that human intelligence is not the canonical form of intelligence against which all others are measured.
"Right now we're going through a cognitive version of the Copernican revolution, where we used to think that human intelligence is the center of the universe, and now we're seeing that there are very different types of intelligence out there with very different strengths and weaknesses," he said.
That reordering is genuinely disorienting, because human intuition was calibrated against the assumption that the hard problems in science require exactly the kind of thinking that humans are good at. If the bottleneck was always verification, then the things that made human mathematicians valuable were precisely the things that could not be automated. The things that can be automated were the things that were, in some sense, always the easier part.
Tao's prescription for the field is not to wait for better models. It is to redesign how mathematics is organized around the asymmetry that already exists. AI is exceptionally good at breadth. Humans are exceptionally good at depth. The current institutional structure of mathematical research was designed around what humans can do, which means it maximizes for depth and has no good mechanism for breadth. That is now the variable to optimize.
"We should have a lot more effort in creating very broad classes of problems to work on rather than one or two really deep, important problems," Tao said. The idea is almost a reversal of the current research funding model: use AI to map entire problem territories first, identify which walls are three feet tall and which are mile-high cliffs, and then concentrate human attention on the ones that matter and resist automated approaches. The remaining 1,079-plus Erdős problems are not all equally hard. They are distributed across a difficulty landscape that has never been systematically charted. AI can chart it. Humans can then climb the walls that matter.
This reframes what the AI science revolution actually means in fields like drug discovery, materials science, or climate modeling, where the narrative has focused on AI generating candidate hypotheses faster. If the bottleneck was always verification, then the relevant AI capability is not the ability to propose a million molecular structures per hour. It is the ability to evaluate them. That is a different engineering problem, a different investment thesis, and a much slower one. The AI made hypothesis generation cheap. Verification is and always was the bottleneck.
Whether that is a comforting or disturbing conclusion depends on what you thought the story was about.