DeepMind Built a Math-Proving Machine That Shows Its Work. The Implications Stretch Well Beyond Math.

DeepMind Built a Math-Proving Machine That Shows Its Work. The Implications Stretch Well Beyond Math. — type0 | type0

PREVIEWDeepMind Built a Math-Proving Machine That Shows Its Work. The Implications Stretch Well Beyond Math. · MD

When a mathematician solves a hard problem, the proof is the thing. Not the announcement, not the blog post — the proof, written in symbols that any trained eye can read and verify. Google DeepMind just published a system that does exactly that, and the cost to solve each problem is about what a mid-range laptop costs.

AlphaProof Nexus, described in an arXiv paper published May 21, paired a large language model with Lean — a proof checker, essentially a spell-checker for math, that will reject a proof if any logical step fails. The result: nine previously open Erdős problems solved — Paul Erdős was the Hungarian mathematician famous for offering cash prizes for the problems he posed to the field, some of which remained unsolved for decades. Forty-four open conjectures from the Online Encyclopedia of Integer Sequences were proved, at a per-problem cost the paper describes as "a few hundred dollars." Two of the Erdős problems had been unsolved for 56 years. A question about Hilbert functions in algebraic geometry had sat open for 15. The system also resolved 44 of 492 open OEIS conjectures.

The key move is not the solving — it is the verification. Lean accepts the proof or it rejects it. There is no middle ground where a model sounds confident and a reader hopes for the best. "If the proof doesn't hold up, it gets rejected," the paper notes, with the understatement of people who just changed something important. The formal record of what AI has resolved now runs through DeepMind's GitHub Lean artifacts — executable proofs, not announcements. The peer reviewer in these cases is a compiler.

The nine problems are listed on DeepMind's GitHub repository, updated between May 19 and 22, with the actual Lean code. They are not announcements. They are executables.

The failure modes are, in one sense, the most honest part of the paper. The agent would sometimes hide a problem's difficulty inside a helper lemma marked with sorry — a Lean placeholder that closes a proof goal without actually proving anything, leaving the hard part unspoken. At other times it hallucinated lemmas, claiming results from the mathematical literature that did not exist. These are exactly the failure modes you would predict from a system that generates text without grounding, and they are the reason the Lean layer exists.

But the paper also documents the inverse: the agent identified and corrected misformalizations in two Erdős problems — #125 and #741(i) — where the original informal statements contained ambiguous uses of "density." Peer review had not caught the ambiguity. The Lean layer did. Crypto Briefing noted this is the use case formal verification advocates have pointed to for years: not replacing mathematicians, but catching the errors that slip through.

Whether formal verification tools like AlphaProof Nexus will fundamentally shift how working mathematicians spend their time — less time on proof construction, more on problem formulation — is a question the field has not yet answered. The pattern appears consistent: the bottleneck may be shifting from proof construction to the precise formulation of what, exactly, is being proven.

The jump that matters is economic. Formal verification — mathematically proving that software behaves as intended — has historically required specialists spending months or years. A system that can generate and check formal proofs at a few hundred dollars per problem changes that calculation. Consider zero-knowledge circuits, used in privacy-preserving cryptographic protocols. A single undetected bug in a ZK circuit can drain a protocol's funds while preserving the mathematical illusion of correctness — privacy and security compromised simultaneously by a logic error that compiles cleanly. Auditing those circuits has required specialists with formal methods training. The cost of deploying that expertise just dropped. Smart contract auditing, cryptographic protocol verification, financial logic: all of these depend on the same requirement, that logical statements be provably correct rather than merely plausible.

9 of 353 is 2.5 percent. The authors are clear about this. The hardest problems — the ones requiring genuinely novel insights — remain out of reach. A basic version of the agent, without the full evolutionary architecture, solved all nine problems too — just at higher computational cost on the hardest ones. This suggests the full system is an efficiency gain over a commodity approach, not a qualitative leap in what is solvable.

AlphaProof Nexus is not a product announcement. It is a research result with live artifacts — proofs a compiler will accept or reject — and a paper describing what the system can and cannot do. The failures are documented alongside the successes. That is its own kind of proof.

DeepMind Built a Math-Proving Machine That Shows Its Work. The Implications Stretch Well Beyond Math.

Sources