A 13.5-million-calculation dataset on industrial catalysts is now peer-reviewed and free to download, a release that addresses a long-standing gap in machine-learning models for chemistry. The dataset, AQCat25, was published in npj Computational Materials, a Nature Portfolio journal, and is hosted on Hugging Face under a Creative Commons license. Its specific contribution, according to the paper's authors, is that it is the first large-scale catalysis dataset to incorporate spin polarization: the magnetic property of earth-abundant metals such as iron, cobalt, and nickel that earlier training sets typically excluded because the underlying quantum-mechanical calculations were too expensive to run at scale.
For decades, computational chemists have leaned on density functional theory, or DFT, a workhorse simulation method for predicting how molecules bind to a catalyst's surface. DFT is accurate but slow, and the cost multiplies when a metal is magnetic, since each calculation has to be repeated for different electron spin configurations. That cost pushed most existing datasets toward non-magnetic materials and a handful of precious metals, leaving earth-abundant magnetic metals under-represented in the training data that machine-learning models use to design new catalysts. AQCat25 covers roughly 47,000 catalyst systems and was generated using roughly 400,000 GPU-hours on NVIDIA DGX Cloud, a level of compute that puts the dataset beyond the reach of most academic groups working alone.
Catalysts are the workhorses of industrial chemistry: they accelerate the reactions behind fertilizers, fuels, plastics, and pharmaceuticals, and the industry often depends on scarce and expensive metals such as platinum, palladium, and rhodium. SandboxAQ says a magnetically-informed dataset at this scale should let researchers screen earth-abundant substitutes more reliably. The "up to 20,000 times faster than first-principles simulation" figure that appears in the company's announcement is a vendor-supplied claim and has not been independently benchmarked in the materials reviewed; the headline numbers in the paper itself (13.5 million DFT calculations, 47,000 systems, 400,000 GPU-hours) are not in dispute.
The release carries endorsements from company leadership. SandboxAQ chief executive Jack Hidary, in a post on LinkedIn cited in the announcement, called the work a "breakthrough in catalyst discovery and computational chemistry." That framing is the company's, not an independent assessment. What is independently verifiable: the paper's peer-review status, the open license on Hugging Face, and the dataset's size relative to earlier releases in the field.
What changes now is access. A research group that wants to train a model on magnetically-aware catalyst data can pull AQCat25 from Hugging Face without negotiating a license, which lowers the barrier to entry for both academic labs and smaller industrial teams. Whether that translates into a working substitute for a platinum-group catalyst in any specific reaction (ammonia synthesis, fuel-cell electrochemistry, hydrocarbon processing) will depend on follow-on work that has not yet been published.