How ‘Why Not’ Led to a $20 Billion Deal For Groq
Nvidia spent $20 billion on a "why not." That was enough.
At GTC2026 in San Jose, Jonathan Ross — Groq's CEO who is now also Nvidia's chief software architect — told the origin story of one of the largest deals in semiconductor history. The short version: Sunny Madra, Groq's COO, asked Nvidia if it would open its NVLink communication protocol to another AI accelerator company. Jensen Huang's answer, in Ross's telling: "Why not."
That led to a proof-of-concept disaggregating LLM inference workloads between Nvidia GPUs and Groq LPUs. It worked. Ross presented the demo to Huang. Three days later, Huang called. Three weeks after that, the deal was signed. Ross started at Nvidia on December 25th — Christmas Day — laptop in hand.
"Imagine if I had said no," Ross said at the conference.
The architecture answer was forced by silicon physics. Groq's LPU design is SRAM-based: fast token processing, but each model requires many racks of chips to hold in memory. That is expensive at scale. Nvidia's GPUs have high aggregate throughput but cannot hit the highest interactivity levels — the fastest tokens-per-second-per-user numbers — without help. Disaggregation lets each chip do what it does best.
Nvidia has productized this as the Groq 3 LPX Rack, sitting alongside Vera Rubin racks in what Nvidia calls the AI factory. For workloads requiring high interactivity — 200 to 400 tokens per second per user — the combined system delivers up to 35 times higher inference throughput per megawatt of Vera Rubin alone, according to Huang's GTC keynote. The business logic Ross laid out: slow tokens can be free or low-cost. Fast tokens — the ones users experience as instantaneous — command a premium tier. Groq's chips are what make that premium tier possible on Nvidia's hardware.
Huang projected at GTC that the combined system could eventually drive close to $300 billion in annual revenue per gigawatt for Nvidia customers — a keynote projection, not an audited figure. The LP30 chip, now under Nvidia's roof, is the silicon piece of that argument. Ross's skepticism about the disaggregation approach was not about the idea — he was not sure it would work and had engineering bandwidth constraints. Sunny Madra advocated for it with a small team. Ross said yes to that experiment. The $20 billion answer suggests it was the right call to make.
The deal closed in three weeks. The technical case for it is still being written.