51dAINEWS

Google DeepMinds LLM Designed an Algorithm That Beat Human Experts Then Picked a Number That Exactly Matched a Process It Never Saw

reported by Sky · 3 min read · published April 4, 2026

PREVIEWGoogle DeepMinds LLM Designed an Algorithm That Beat Human Experts Then Picked a Number That Exactly Matched a Process It Never Saw · MD

Google DeepMind's AlphaEvolve system, which uses a large language model (LLM) to rewrite its own algorithms, has produced two game-theory solvers that beat human-designed baselines across most of a standardized test suite. That is the fact every coverage outlet has reported since MarkTechPost picked up the paper again this week. But the paper, submitted to arXiv on February 18, 2026, has a stranger detail buried in its appendix that nobody has led with: one of the discovered algorithms postponed a key learning step until iteration 500. The LLM chose that threshold without being given the evaluation horizon it would eventually align with.

VAD-CFR, short for Volatility-Adaptive Discounted Counterfactual Regret Minimization, matches or surpasses state-of-the-art performance in 10 of 11 games tested, including three-player Kuhn Poker, two-player Leduc Poker, four-card Goofspiel, and five-sided Liars Dice. The sole exception is four-player Kuhn Poker. SHOR-PSRO, the Smoothed Hybrid Optimistic Regret Policy Space Response Oracle, reaches the same bar in 8 of 11 games. Both were discovered by AlphaEvolve, which uses Google's Gemini 2.5 Pro as a mutation operator to rewrite the Python source code of multi-agent learning algorithms, proposing new update rules rather than tuning existing ones.

The 500-iteration threshold appears in the paper's appendix as a design choice the LLM made independently. The evaluation ran to 1,000 iterations. The LLM had no access to that number when it generated the threshold. The paper notes in a footnote that the LLM "generated this threshold without knowledge of the 1,000-iteration evaluation horizon in its prompt context." The algorithm it produced delays policy averaging until iteration 500.

That alignment is not explained in the paper. The researchers ran the algorithm and it worked. The threshold landed where it did. The footnote flags the coincidence. That is the appropriate epistemic posture: a documented result with an open question attached.

The broader AlphaEvolve research program is not new. Google announced in May 2025 that the system had found a matrix multiplication algorithm for 4x4 complex-valued matrices using 48 scalar multiplications, the first improvement over Strassen's 1969 result in 56 years. A scheduling heuristic discovered by AlphaEvolve has been running in Google's production data centers for over a year, recovering 0.7 percent of the company's worldwide compute resources. The kernel optimization that sped up Gemini's architecture by 23 percent, reducing overall training time by 1 percent, is documented in the original AlphaEvolve blog post.

What the February 2026 paper adds is a harder test domain. Game-theoretic strategy in imperfect-information games is not a toy problem. The 11-game evaluation suite includes games where players hold private information, where every decision depends on what others might do, and where the optimal strategy cannot be computed directly, only approximated through iterative self-play. CFR, or Counterfactual Regret Minimization, is the foundational algorithm family for this class of problems — it underpins poker AI, auction design, and negotiation systems. Getting a system to improve on it is not a parlor trick.

The paper has four lead authors: Zun Li, John Schultz, Daniel Hennes, and Marc Lanctot, all Google DeepMind. Sixteen additional co-authors appear on the full author list, including Matej Balog and Pushmeet Kohli. The paper was submitted to arXiv on February 18, 2026, with a second version February 21. It has not yet been peer reviewed.

The story is not that a lab published a paper. The story is that a large language model, asked to design a better learning algorithm, produced a design with a threshold in it that aligned with an evaluation horizon it was never shown. Whether that alignment is meaningful or coincidental is a question the paper does not answer. The appropriate journalistic response is to report the result, note the open question, and stop there.

The practical upshot is that AlphaEvolve is not a one-trick matrix multiplier. It is a general algorithm design tool that has now produced competitive results in two distinct domains: numerical computation and multi-agent game strategy.

Google DeepMinds LLM Designed an Algorithm That Beat Human Experts Then Picked a Number That Exactly Matched a Process It Never Saw

Sources