Mathematics is becoming the cleanest natural experiment for AI in any intellectual field, not because it is special, but because it is unusually legible. On the Dwarkesh Podcast, the mathematician Grant Sanderson, creator of the YouTube channel 3Blue1Brown (Substack), argues that math is also the most diagnostic. It already shows both the layers AI is compressing and the layer it cannot, and that layered structure is the same shape every other knowledge field carries.
The compression layer is the easy one to see. Off-the-shelf models now solve International Mathematical Olympiad problems in minutes, a structured competition category that historically consumed the strongest pre-college mathematicians. A writeup at RITS Shanghai NYU describes an AI system winning gold at the 2025 International Mathematical Olympiad. On a different frontier, OpenAI has said a model disproved a long-standing central conjecture in discrete geometry. Both wins are formalizable in the same way: clean right-or-wrong answers, short proofs, machine-checkable in seconds.
But math also has a layer those benchmarks do not reach. A gold-medal olympiad problem is a five-page solution away from being checked. A deep human proof of a major conjecture is something different. Not just verification, but the slow, century-long project of understanding what a result means, why it is true, and where it sits in the surrounding web of ideas. That work is the layer where mathematics has historically produced its biggest leaps. Sanderson's framing on the podcast is that math capability is jagged, strong on certain problems and weaker on the conceptual breakthroughs that historically moved the field, and that this second layer is not the kind of work a reward loop can compress.
That observation is what makes math a useful proxy. Every intellectual field has a formalizable surface and a conceptual core. In code, the surface is boilerplate and test-fix loops; the core is the architectural decision that reorganizes a codebase. In legal writing, the surface is citation and clause assembly; the core is the argument a court has never seen. In math, the surface is the olympiad problem; the core is the proof that takes a generation to understand. Because the layered structure is unusually visible in mathematics, the field becomes a microscope for what AI is doing to knowledge work everywhere.
Two open questions sit on top of the frame. The first is whether AI, on net, increases or decreases human understanding of a field. The compression layer is real and growing; whether it trains a larger population of humans to think more carefully, or substitutes for thinking and leaves the conceptual core to atrophy, is an empirical question the next decade will settle. The second is the size of the research overhang, the corpus of latent connections already present in the literature that no individual researcher has time to read. Sanderson raised this on the podcast as a genuinely open question. If AI is now systematically trying to connect ideas already present in the literature, the productive question stops being what the next theorem is and starts being where in the corpus the next connection is hiding. A preprint on reinforcement learning with verifiable rewards, the family of training techniques in which a model improves against checkable answers like math proofs, is one signal that this is the path labs are walking down.
The honest version of the frame has its limits. Sanderson is careful not to extend the math analogy into occupational claims; his Substack extends the same argument into writing and other fields without naming which professions disappear. Dwarkesh's adjacent episode on RLVR and science sits next to this conversation in a running AI-math thread, worth flagging for readers who have not followed the broader pattern.
What the math analogy gives the reader is a vocabulary. Three layers: a compressible surface where AI is already winning, a jagged middle where some moves are AI-shaped and some are not, and a small conceptual core where work is slow, human, and resistant to compression. Every knowledge field has those layers. What is new is that the surface is being compressed in real time.