DeepMind Built Better Poker Bots. Nobody Knows Why.
AlphaEvolve found two algorithms that beat human-designed baselines in poker and dice games. The DeepMind researchers built them — and cannot fully explain how they work.
AlphaEvolve found two algorithms that beat human-designed baselines in poker and dice games. The DeepMind researchers built them — and cannot fully explain how they work.

image from grok
DeepMind's AlphaEvolve uses Gemini 2.5 Pro to generate mutations of multi-agent learning algorithms, breeding variants across thousands of generations to find VAD-CFR and SHOR-PSRO, which match or beat human-designed baselines in most evaluation games. The key finding is interpretability: VAD-CFR postpones policy averaging until iteration 500—a threshold generated by the LLM that nobody on the team can explain, creating a working algorithm whose mechanism is not understood. This is not 'AI designing AI' but rather search in a structured mathematical space, similar to AlphaTensor discovering faster matrix multiplication.
DeepMind's Algorithm Finder Found Something Humans Can't Explain
Google DeepMind built a system that finds new game theory algorithms humans have not yet explained.
AlphaEvolve, described in a paper submitted to arXiv in February 2026, uses Gemini 2.5 Pro as an evolutionary engine: the LLM generates mutations to existing multi-agent learning algorithms, candidate variants are evaluated in simulation, and the best performers are bred forward. The process repeated across thousands of generations produced two algorithm variants, VAD-CFR and SHOR-PSRO, that match or beat human-designed baselines in 10 of 11 and 8 of 11 game environments respectively. Both are now published in arXiv 2602.16928.
The numbers are real. The WIRE headline was not.
The paper, authored by Zun Li, John Schultz, Daniel Hennes, and Marc Lanctot, is a genuine contribution to the multi-agent reinforcement learning literature. VAD-CFR outperforms Discounted Predictive CFR+ and related baselines across a suite of poker variants, Goofspiel, and Liars Dice. SHOR-PSRO beats Uniform, Nash, AlphaRank, PRD, and RM meta-solvers on 8 of 11 evaluation games, according to MarkTechPost's analysis of the results. The training set was deliberately narrow: 3-player Kuhn Poker, 2-player Leduc Poker, 4-card Goofspiel, and 5-sided Liars Dice. The evaluation set included larger, unseen variants of those games. The algorithms generalized.
But the WIRE claim that "researchers built AI that designs better AI than humans can" is not what this paper shows. AlphaEvolve is not designing neural architectures, discovering new model families, or writing foundation model code. It is searching a structured space of mathematical algorithms in a well-defined domain. That is impressive. It is also the same class of result as DeepMind's AlphaTensor, which discovered faster matrix multiplication algorithms in 2022, and AlphaCode, which wrote novel solutions to competition programming problems. AI has been finding things humans missed in narrow mathematical spaces for years. The framing "AI designs AI" keeps showing up because it generates engagement, not because it accurately describes what is happening.
What is actually interesting is the interpretability problem the results create.
VAD-CFR postpones policy averaging until iteration 500. That threshold was not hand-designed by a researcher who knew the 1,000-iteration evaluation horizon. Gemini 2.5 Pro generated it as part of a mutation, and the evolutionary process kept it because it performed well. Nobody on the team appears to have worked out why waiting 500 iterations before averaging produces better convergence than averaging from iteration one. The algorithm works. The mechanism is not yet fully understood.
This is a pattern worth naming. DeepMind's own blog post on the paper calls it "automated algorithm discovery" and notes that the evolved algorithms "exhibit qualitatively different behavior from their human-designed counterparts." The authors acknowledge they are "excited to explore the interpretability of these findings" — which is a careful way of saying the system found something and the theorists have to catch up.
That gap between performance and understanding is the actual story. It is not "AI designs better AI." It is: the machine found a trick in a well-studied domain, the trick works, and the researchers are still figuring out the math.
The paper also raises a question about what "design" means in algorithmic discovery. AlphaEvolve did not start with a blank canvas. It initialized from human-designed base algorithms and searched the neighborhood around them. The search is guided by a learned prior (Gemini 2.5 Pro's understanding of what algorithm mutations tend to improve performance) and by empirical evaluation in simulation. The output is genuinely new source code that did not previously exist. But the process depends on human-designed evaluation infrastructure and on base algorithms that humans built. Calling the result "AI-designed" in the strong sense requires ignoring everything upstream of the mutation operator.
Whether this constitutes a meaningful boundary crossing will depend on what comes next. The authors note they are exploring applications beyond game theory to problems where the solution space is "richly structured." If the approach generalizes to domains with higher-dimensional solution spaces, the interpretability problem becomes more urgent, not less. Right now, game theory has enough mathematical structure that researchers can reason backward from result to principle, eventually. In messier domains — protein design, chip layout, system architecture — the gap between what the system found and what anyone can explain could be much wider.
The paper is real. The results are solid. The framing is not. Run the numbers, read the arXiv, and decide for yourself whether an algorithm that works but nobody fully understands constitutes a category change — or whether it is what AI-assisted mathematical research has always been, just faster and stranger than before.
Story entered the newsroom
Assigned to reporter
Research completed — 5 sources registered. AlphaEvolve (Gemini 2.5 Pro-powered evolutionary coding agent) discovered two new MARL algorithm variants that beat human-designed baselines in game t
Draft (748 words)
Reporter revised draft (737 words)
Reporter revised draft (763 words)
Reporter revised draft based on fact-check feedback
Approved for publication
Published (759 words)
Get the best frontier systems analysis delivered weekly. No spam, no fluff.
Artificial Intelligence · 9h 41m ago · 3 min read
Artificial Intelligence · 10h 0m ago · 3 min read