Terence Tao says AI is already changing who gets to do mathematics
Terence Tao and Google DeepMind's AI just improved solutions to 23 of 67 mathematics problems, producing a result that no human mathematician had found. Three days later, Tao told Nature that the profession is already sorting people by whether they can work with tools whose output they may not fully trust. The two events are separate. Taken together, they describe a field that has crossed a threshold without agreeing on what it means.
Tao and Tanya Klowden's new worry has a name: citogenesis. The mechanism is simple enough to be dangerous. A researcher asks an AI system to summarize the literature on an obscure problem. The system produces a plausible but partly synthetic summary. Someone uploads that summary to a public repository. A future training run scrapes it as ground truth. The next model learns from the residue of the first model's mistake, and a false citation begins to look like part of the mathematical record. They describe it in a 27-page arXiv preprint — an unabridged version of a Blackwell Companion article that Tao wrote on his blog took more than a year to write and was already slightly out of date by the time it appeared.
That caveat is not a footnote. It is the point. Mathematics is trying to reason about systems moving faster than its own publication cycle. Tao told Nature that AI is forcing mathematicians to rethink what a proof is, what a paper is, and what the profession is for. A graduate student who refuses to use AI tools and wants to prove things only the old way may have fewer opportunities, he said. That is a labor-market claim, not a sci-fi claim.
The concrete evidence for why the warning matters is the AlphaEvolve result. Tao and Google DeepMind used a system that evolves computer programs with help from Gemini models to search for improvements on known solutions. On 23 of 67 problems, AlphaEvolve found small improvements, according to Quanta Magazine. One of those improvements was a new mathematical result — a genuine discovery made with machine assistance. In February, a separate challenge called First Proof gave AI models one week to solve ten research-level math problems chosen to avoid training-data contamination; the models solved more than half, Quanta reported. These are not toy benchmarks. They are research-grade problems, solved under conditions designed to make cheating difficult.
The preprint's broader argument is that AI is a natural continuation of older tools for creating, organizing, and spreading ideas, from notation to libraries to search engines. That framing is useful but not novel — philosophers have made similar arguments about writing and the printing press. What the paper adds is specificity about the risks. Beyond citogenesis, Tao and Klowden describe what they call odorless proofs: AI-generated arguments that satisfy formal verification, meaning a separate system can mechanically check each step, while lacking the heuristic cues that help human mathematicians understand why an argument works. A proof can be correct and teach almost nothing. For a profession that advances by turning one proof into intuition for the next problem, that is a practical problem, not a philosophical one, according to the arXiv preprint.
They also propose a phased workflow. In a near-term phase they call Vanilla Extract, the mathematician still designs the argument and uses AI at the margins for search or cleanup. In a more advanced Red-Team phase, AI generates candidate proofs while formal verification systems and human experts try to break them. The human role shifts from solitary proof-builder toward architect, critic, and curator. Whether mathematicians follow that path or resist it may determine whether the productivity gains come with epistemic losses.
The denial, anger, bargaining, and depression phases are already visible in the community, Tao told Nature, but acceptance is beginning to arrive. The counterforce is that almost every dramatic claim in this space ages badly. Benchmarks improve, systems fail in odd places, and today's impossible theorem can become tomorrow's demo without telling us whether the tool understands anything. Tao's own blog caveat applies to the preprint itself: even a careful paper by a Fields Medalist can be overtaken by the systems it studies before the formal version reaches print.
The next thing to watch is not whether AI can produce another impressive proof demo. It is whether mathematicians build provenance rules, verification habits, and publication norms fast enough to keep machine-generated knowledge from eating its own tail. If they do not, the profession may get more productive and less sure of what it knows at the same time.