Zyphra's ZAYA1-8B Rewrites the Small-Model Playbook — on AMD Hardware
Zyphra's ZAYA1-8B Rewrites the Small-Model Playbook — on AMD Hardware
Every frontier AI model you have heard of was trained on NVIDIA hardware. Zyphra just published a technical report suggesting that is not a prerequisite. The company trained ZAYA1-8B — a reasoning model with competitive benchmark scores — entirely on AMD MI300x chips, running across a 1,024-node cluster on IBM Cloud. That is the story. The numbers come second.
On AIME.25 and HMMT.25, two mathematics competition benchmarks used to stress-test reasoning chains, ZAYA1-8B scores 91.9% and 89.6% respectively, matching or exceeding DeepSeek-R1-0528 while using a fraction of the compute. The reasoning tail it carries forward to final output is only 4,000 tokens, not the unbounded chains typical of chain-of-thought reasoning. The mechanism behind that efficiency is called Markovian RSA: the model reasons in discrete chunks it can generate in parallel, bounded by a fixed-length context window regardless of how long the reasoning chain runs. The constraint is not a ceiling on thinking; it is a structural property that keeps inference predictable.
"Every model you have heard of was trained on NVIDIA hardware," Firethering noted. "The entire open source AI ecosystem has been built on a de facto NVIDIA monopoly." The training cluster size (1,024 nodes) is substantial, and IBM Cloud as the infrastructure provider signals this is not a bespoke lab setup. This is a production-grade run. Zyphra's result does not disprove the NVIDIA framing — it demonstrates that the AMD alternative is viable at competitive performance levels.
The intellectual ancestry here traces to Beren Millidge, Zyphra's Chief Scientist, who completed a PhD at the University of Edinburgh in Machine Learning and Computational Neuroscience and has spent years thinking about bounded rationality and efficient inference. Millidge has written publicly about the gap between how frontier labs actually train and how they present their work publicly. ZAYA1-8B is the lab's answer to the question of whether you can train for depth without scaling compute linearly.
Krithik Puthalath, Zyphra's Founder and CEO, put it this way: "ZAYA1-8B demonstrates what is possible when architecture, pretraining, and reinforcement learning are co-designed toward a single objective: maximizing the intelligence extracted per parameter and per FLOP." That is a clean articulation of the efficiency thesis. Whether the benchmarks bear it out at scale is the open question.
The benchmarks come from Zyphra's own evaluation suite, which is worth noting. The numbers are consistent with established public benchmarks — AIME and HMMT are not proprietary — but the reporting is self-reported, which is standard for pre-release model cards and not a reason for skepticism on its own. The independent coverage from VentureBeat and Firethering has not challenged the core figures. What Firethering did flag, and what is worth holding: ZAYA1-8B's agentic capabilities lag comparable models. On BFCL-v4, a benchmark that tests tool use and multi-step agentic tasks, ZAYA1-8B scores 39.22. Qwen3-4B-Thinking, a comparable open-source reasoning model, scores 49.7 on the same test. The math and coding story is genuine. The agent story is not yet competitive.
There is a nuance in the Markovian RSA approach that deserves attention: it only works because Zyphra co-trained the model to understand and respond to the Markovian chunking process. When researchers applied the same inference method to Qwen3-4B without that co-training, the performance uplift was significantly smaller. This is not a plug-in upgrade. It is an architectural commitment.
ZAYA1-8B is released under Apache 2.0 and available on Zyphra Cloud and HuggingFace. The model weights are out. The code for running Markovian RSA inference is published. The benchmark results are in the arXiv technical report.
Whether ZAYA1-8B generalizes beyond math and coding is the question the next few months of community evaluation will answer. But the AMD training result stands independently: a model trained at meaningful scale on non-NVIDIA hardware that can hold its own on the evaluations that matter for reasoning. The hardware monoculture has a proof-of-concept counterexample. The rest is up to the community.