Two Chips, One Bet: How Google Split Its TPU Future in Two
9,600 chips. 121 Exaflops. Google just built one of the largest AI supercomputers ever—but split in two.

Google spent a decade building the Tensor Processing Unit as a single, versatile chip. Today it is splitting it in two.
The company announced at Google Cloud Next its eighth-generation TPU lineup, splitting into a training chip (TPU 8t) and an inference chip (TPU 8i), each designed by a different fabless partner. Broadcom handles the training silicon; MediaTek, the inference silicon, at what analysts estimate is 20 to 30 percent lower cost than alternative suppliers Jon Peddie Research.
The announcement is technically substantial: TPU 8i delivers 80 percent better performance-per-dollar compared to its Ironwood predecessor, with 384MB of on-chip SRAM (three times the prior generation), a new Boardfly interconnect topology that cuts network diameter by more than 50 percent, and a dedicated Collectives Acceleration Engine that reduces on-chip latency by up to five times Google Blog. TPU 8t superpods scale to 9,600 chips delivering 121 Exaflops of compute.
But the structural move is the story. Google has explicitly accepted what infrastructure buyers have been demanding: separate silicon for training and inference workloads, optimized independently rather than compromising both on a single design. Nvidia still dominates training, where CUDA ecosystem lock-in creates switching costs custom silicon cannot easily replicate. Inference is different. Inference is continuous, repetitive, and amenable to fixed-function optimization — the exact conditions where purpose-built chips win on cost.
The battleground is shifting towards inference, said Chirag Dekate, an analyst at Gartner. In that battleground, Google has an infrastructure advantage Los Angeles Times.
That advantage is being purchased at scale. Anthropic has committed to up to one million TPUs, with access to roughly 3.5 gigawatts of next-generation TPU compute starting in 2027. Meta signed a multibillion-dollar, multi-year deal to use TPUs through Google Cloud. Google spent more than $9 billion on TPU development and deployment in 2025, according to Jon Peddie Research.
The competitive threat to Nvidia is real but conditional. Google 80 percent performance-per-dollar improvement is the company own claim and has not been independently validated. More significantly, TPU 8i appears to be available through Google Cloud only, not for on-premise deployment, which limits its appeal to customers who want to run inference on their own hardware. The chip general availability date also remains unclear against a stated Q3 2026 mass production start Wccftech.
Still, the direction of travel is established. Custom chip sales are projected to grow 45 percent in 2026, compared to 16 percent for GPUs, with the custom AI accelerator market expected to reach $118 billion by 2033, according to TrendForce TrendForce. Google is not alone: Amazon has Trainium and Inferentia, Microsoft has Maia, Meta has MTIA. The hyperscalers are building their own silicon, and the inference market is where the competition is intensifying The Next Web.
Nvidia own response — its Groq-licensed inference chip launched last month — acknowledges the same reality. The question is not whether inference specialization matters. It does. The question is who wins at it, and at what price.





