DeepMind Built Better Poker Bots. Nobody Knows Why.
DeepMind's Algorithm Finder Found Something Humans Can't Explain
Google DeepMind built a system that finds new game theory algorithms humans have not yet explained.
AlphaEvolve, described in a paper submitted to arXiv in February 2026, uses Gemini 2.5 Pro as an evolutionary engine: the LLM generates mutations to existing multi-agent learning algorithms, candidate variants are evaluated in simulation, and the best performers are bred forward. The process repeated across thousands of generations produced two algorithm variants, VAD-CFR and SHOR-PSRO, that match or beat human-designed baselines in 10 of 11 and 8 of 11 game environments respectively. Both are now published in arXiv 2602.16928.
The numbers are real. The WIRE headline was not.
The paper, authored by Zun Li, John Schultz, Daniel Hennes, and Marc Lanctot, is a genuine contribution to the multi-agent reinforcement learning literature. VAD-CFR outperforms Discounted Predictive CFR+ and related baselines across a suite of poker variants, Goofspiel, and Liars Dice. SHOR-PSRO beats Uniform, Nash, AlphaRank, PRD, and RM meta-solvers on 8 of 11 evaluation games, according to MarkTechPost's analysis of the results. The training set was deliberately narrow: 3-player Kuhn Poker, 2-player Leduc Poker, 4-card Goofspiel, and 5-sided Liars Dice. The evaluation set included larger, unseen variants of those games. The algorithms generalized.
But the WIRE claim that "researchers built AI that designs better AI than humans can" is not what this paper shows. AlphaEvolve is not designing neural architectures, discovering new model families, or writing foundation model code. It is searching a structured space of mathematical algorithms in a well-defined domain. That is impressive. It is also the same class of result as DeepMind's AlphaTensor, which discovered faster matrix multiplication algorithms in 2022, and AlphaCode, which wrote novel solutions to competition programming problems. AI has been finding things humans missed in narrow mathematical spaces for years. The framing "AI designs AI" keeps showing up because it generates engagement, not because it accurately describes what is happening.
What is actually interesting is the interpretability problem the results create.
VAD-CFR postpones policy averaging until iteration 500. That threshold was not hand-designed by a researcher who knew the 1,000-iteration evaluation horizon. Gemini 2.5 Pro generated it as part of a mutation, and the evolutionary process kept it because it performed well. Nobody on the team appears to have worked out why waiting 500 iterations before averaging produces better convergence than averaging from iteration one. The algorithm works. The mechanism is not yet fully understood.
This is a pattern worth naming. DeepMind's own blog post on the paper calls it "automated algorithm discovery" and notes that the evolved algorithms "exhibit qualitatively different behavior from their human-designed counterparts." The authors acknowledge they are "excited to explore the interpretability of these findings" — which is a careful way of saying the system found something and the theorists have to catch up.
That gap between performance and understanding is the actual story. It is not "AI designs better AI." It is: the machine found a trick in a well-studied domain, the trick works, and the researchers are still figuring out the math.
The paper also raises a question about what "design" means in algorithmic discovery. AlphaEvolve did not start with a blank canvas. It initialized from human-designed base algorithms and searched the neighborhood around them. The search is guided by a learned prior (Gemini 2.5 Pro's understanding of what algorithm mutations tend to improve performance) and by empirical evaluation in simulation. The output is genuinely new source code that did not previously exist. But the process depends on human-designed evaluation infrastructure and on base algorithms that humans built. Calling the result "AI-designed" in the strong sense requires ignoring everything upstream of the mutation operator.
Whether this constitutes a meaningful boundary crossing will depend on what comes next. The authors note they are exploring applications beyond game theory to problems where the solution space is "richly structured." If the approach generalizes to domains with higher-dimensional solution spaces, the interpretability problem becomes more urgent, not less. Right now, game theory has enough mathematical structure that researchers can reason backward from result to principle, eventually. In messier domains — protein design, chip layout, system architecture — the gap between what the system found and what anyone can explain could be much wider.
The paper is real. The results are solid. The framing is not. Run the numbers, read the arXiv, and decide for yourself whether an algorithm that works but nobody fully understands constitutes a category change — or whether it is what AI-assisted mathematical research has always been, just faster and stranger than before.