The Unlikely Comeback: How Intel Turned the CPU Into the Traffic Cop of the AI Era
The Unlikely Comeback: How Intel Turned the CPU Into the Traffic Cop of the AI Era
While the AI industry spent the last several years obsessing over GPU counts, Intel was quietly repositioning the chip it never stopped making: the CPU.
At Computex 2026 this week, Intel unveiled rack-scale reference designs that pack up to 36,864 CPU cores into a 100-kilowatt enclosure, paired with Foxconn and SambaNova (The Register). The headline numbers are absurd, as intended. But the real story is what those cores are actually doing.
They are not running the AI model. They are running the agent.
The Three-Way Split That Changes Everything
The architecture Intel and SambaNova announced is not a GPU replacement play. It is a disaggregated inference system that distributes the work of a single AI request across three different processor types: GPUs handle the initial prompt processing known as prefill, where compute density matters most. SambaNova's reconfigurable dataflow units take over during token generation, the memory-bandwidth-heavy decode phase where agents spend most of their time. And Intel's Xeon 6 processors manage the orchestration layer that coordinates everything between those two stages, plus all the work that happens outside the model itself: code compilation, API calls, sandbox execution, vector database queries, and the framework infrastructure that connects them (Intel Newsroom).
This is not an incremental improvement to an existing system. It is an explicit repudiation of the idea that a single processor type can efficiently handle every phase of AI inference.
GPUs are very good at parallelizing matrix math for input processing, Anton McGonnell, vice president of product at SambaNova, told Data Center Knowledge. They are not good at decoding, especially when you have latency-sensitive workloads.
The numbers from Creative Strategies CEO Ben Bajarin tell the infrastructure math in plain terms: during the training era, the typical AI deployment ratio looked like roughly one CPU for every four GPUs. Agentic inference, where models must respond continuously to evolving multi-step tasks, shifts that ratio to roughly one CPU per GPU, or fewer. Every GPU accelerator in a rack now needs a proportionally larger CPU backbone to keep it fed (Intel Newsroom).
What Changed in the Agentic Era
The training era was straightforward: models consumed prompts, ran them through expensive forward passes, and returned answers. The hardware optimized for that job. GPUs won.
Agentic workloads broke that simplicity. An AI agent handling a complex task does not fire one prompt and wait for a response. It plans, calls tools, executes code, queries databases, runs tests, and loops back into inference repeatedly until the work is done. Each of those steps has different compute characteristics, and most of them do not benefit from GPU acceleration. They benefit from a fast, dense, x86-compatible CPU that can run an agent harness without blowing the power budget.
Agents need CPUs for two reasons, SambaNova explained in a technical blog post. To orchestrate inference and to execute the work around inference (SambaNova Blog). The company calls the first role the host CPU and the second the action CPU. Both jobs fall to Xeon 6 under the joint blueprint.
The practical implication is that adding GPU compute to an agentic deployment without adding proportional CPU capacity does not linearly improve performance. The agent spends significant time waiting on CPU-bound work that GPUs cannot accelerate. The disaggregated approach addresses this by matching each phase to its optimal processor, rather than forcing all phases through a single accelerator.
The New Players Behind the Architecture
The announcement also surfaces a new entity that did not exist a year ago. Vector Core Compute is a purpose-built enterprise inference cloud formed by Vista Equity Partners and Cambium Capital (Intel Newsroom). At Computex, the company demonstrated fully disaggregated inference running Intel Xeon 6 for orchestration, SambaNova RDUs for decode, and NVIDIA Blackwell GPUs for prefill. The demonstration ran from a Vector Core Compute facility in Los Angeles.
Together.AI is the first commercial customer on the platform, claiming the fastest enterprise inference on the MiniMax 2.5 model of any architecture to date. Vista Equity has secured early access for its portfolio companies, which collectively serve more than 2.5 million enterprise customers and 750 million users worldwide (Intel Newsroom).
The Together.AI claim is currently unverified by independent benchmarks, and Vector Core Compute has not published pricing or availability data. Both are legitimate open questions before treating this as a market signal rather than a launch announcement.
Foxconn is the other named partner, handling system integration for the rack-scale designs and manufacturing a CPU-dense variant for workloads that do not need GPU acceleration at all: cost-optimized inference, data processing, and hybrid AI deployments where the agent harness runs but the model does not require a discrete accelerator (Intel Newsroom).
The 18A Question
Xeon 6+ is notable for a reason Intel is careful not to bury in the density numbers: it is the first data center CPU built on Intel 18A, the company's most advanced process node and the same technology at the center of Intel's foundry ambitions (Data Center Dynamics).
This is not a minor architectural milestone. Intel's fabrication business has spent years trying to demonstrate that 18A can produce working silicon at volume. A flagship Xeon product on that node is a public statement of confidence in a node that has had a difficult path to production.
The chip specifications reflect the density demands of the agentic era specifically: up to 288 efficient cores per package, 576 megabytes of last-level cache configured as a sandbox for tool execution, 12 memory channels with 8,000 megatransfers per second DDR5, and 96 lanes of PCIe Gen 5 alongside 64 lanes of CXL for accelerator connectivity (Intel Newsroom).
Intel claims the Xeon 6990E+, the top-bin part, delivers 1.3 times the performance per thread and 1.3 times the performance per thread per watt versus AMD's Epyc 9965. Those are Intel's own numbers, presented at a press conference, and should be treated accordingly (Data Center Dynamics).
Ericsson, which is deploying Xeon 6+ in its network infrastructure, reported independently sourced gains: 30 percent performance improvement at equivalent core counts, 60 percent better performance per watt, and a 38 percent reduction in runtime rack power consumption (Data Center Dynamics). Those numbers are worth noting because they come from a customer, not a vendor press release.
The Infrastructure Implication
The aggregate direction is clear even if every individual claim warrants skepticism: the industry is responding to a real shift in where agentic workloads spend their cycles. The GPU is not being displaced from AI. But the CPU is no longer a supporting actor in the data center AI stack. It is becoming a first-class citizen again, specifically because agents do work that the GPU was never designed to do.
This creates a structural challenge for inference providers who built their cost models around GPU-only deployments. If the CPU-to-GPU ratio is genuinely shifting toward one-to-one, the procurement math for agentic infrastructure changes significantly. Dense CPU racks like the 36,864-core design Intel showed this week are not a premium option for edge cases. They may be the baseline for anyone running agentic workloads at scale.
The disaggregated inference blueprint from Intel and SambaNova is a bet on exactly that shift. Whether it survives contact with real enterprise procurement cycles remains the open question. The architecture is sound. The commercial evidence is still thin.