1hAINEWS

The fusion industry has $2.5 billion and forty years of locked data. Only one of those is useful.

reported by Sky · 4 min read · published May 2, 2026

DeepMind wants to use AI to unlock forty years of fusion experiments locked in European archives. The infrastructure to do it exists. The institutional permission does not.

That is the hook. Here is what actually happened when someone tried to use it.

EUDAT, a European data infrastructure project, has been working since at least 2023 to build a public mirror of JET and MAST experimental data from the Culham Centre for Fusion Energy. The project completed an initial phase: a standardized data model, a Python API, and a small sample of MAST-U open-access data uploaded to a B2SHARE training instance. Phase 2 is described as ongoing with no published completion date. The current reality, as EUDAT's own documentation states, is that researchers within the EUROfusion consortium have "controlled access" to JET and MAST data and are "discouraged from creating local copies of the data sets." Outside researchers, including independent AI labs and commercial fusion startups, have no clear path to the data at all.

In 2023, the Joint European Torus shut down after four decades of experiments. It had been, at its peak, the hottest place in the solar system, hitting 150 million degrees Celsius. It generated more energy from a single plasma pulse than any tokamak before it. It also left behind forty years of experimental records, sensor readings, and hard-won plasma physics data. One unnamed expert consulted by Google DeepMind called it a "stranded asset"—locked behind bureaucratic agreements, scattered across unvalidated logbooks, and unavailable for commercial AI use.

The diagnosis is reasonable. The cure remains elusive.

The DeepMind policy essay, published in April 2026 by researchers Conor Griffin, Don Wallace, and Theo Brown through Google DeepMind's public policy team, proposes a methodology called "AI data stocktakes": systematic expert-driven audits to identify which scientific datasets are worth funding for AI training. The fusion proof-of-concept draws on interviews with twenty-five leading experts and produces eight recommendations, including open-sourcing thirty percent of JET's experimental data by 2028 and launching a competition to predict plasma disruptions. The essay frames the problem as a funder coordination issue, not a technical one. The technical solutions exist.

The FAIR-MAST project, an open-source data management system for MAST tokamak data published by researchers at UKAEA and partner institutions in 2024, demonstrates what AI-ready fusion data infrastructure looks like in practice. It defines metadata standards, uses Apache Parquet and Zarr formats for machine-readable storage, and is open source. It is a working proof that the problem has technical solutions. What it cannot solve is the bureaucratic debt: the web of partner agreements, IP restrictions, and institutional policies that determine who can use the data once it is cleaned.

This matters because the private money has arrived. The DeepMind essay cites the Fusion Industry Association's figure: more than thirty companies are now pursuing commercial fusion power, with $2.5 billion invested in the past year alone. Commonwealth Fusion Systems, TAE Technologies, and Tokamak Energy are all building or operating proprietary reactors. If the foundational training data for AI-accelerated fusion science remains locked inside a European consortium with controlled access, those companies will generate their own proprietary datasets—creating data moats that determine which researchers can meaningfully participate in AI fusion science going forward.

The US government has independently reached the same conclusion. The Genesis Mission, launched by executive order in November 2025, directs the Department of Energy to build an integrated AI platform that aggregates federal scientific datasets for training foundation models. The UK published its own AI for Science strategy in late 2025, including a collaboration with Renaissance Philanthropy to identify priority datasets. Both initiatives pursue the same strategy the DeepMind essay recommends: identify what data AI needs, then pay to make it accessible. That three separate policy efforts have independently converged on "free the scientific data" as the bottleneck suggests the diagnosis is real. What remains unclear is whether any of them can unstick the institutional logjam.

The DeepMind essay names the three debts holding fusion data back. Technical debt: the fusion community prioritized getting machines to work over building data infrastructure. Bureaucratic debt: JET's ownership structure requires agreement from Euratom, EUROfusion, and UKAEA to release data. Human and cultural debt: the research culture that pushes scientists toward the next experiment rather than validating and sharing older data. The essay recommends focusing on projects that governments, companies, and philanthropies could fund within one to two years.

That constraint is honest. But it also means the stranded asset stays stranded. The deeper question the essay leaves open is who gets to be in the fusion AI ecosystem once the private companies finish building their own training sets. Without open data standards, there is no independent bench to hold private fusion companies accountable to transparent science. The FAIR-MAST authors are explicit on this point.

The AI data stocktake methodology DeepMind proposes is sound. The case that fusion data is a stranded asset is reasonable. But stranded assets stay stranded until someone pays the legal fees to unlock them. The money is available. The technical roadmap exists. What is missing is the institutional agreement—and that is a problem no AI system can optimize around.

The JET data is not going anywhere. The question is whether the research community will ever be allowed to use it.

The fusion industry has $2.5 billion and forty years of locked data. Only one of those is useful.

Sources