Mitesh Agrawal does not open with a benchmark. He opens with a chip you would find in a smartphone.
That is the supply chain decision that got Positron AI into Oracle's cloud: the first non-Nvidia or AMD silicon running in a major hyperscaler's live data center, the company says, when every other AI chip startup is still waiting in line. Rather than chase the advanced memory modules that Nvidia and Apple have locked up at TSMC Arizona through 2027, Positron built its inference accelerators around LPDDR: commodity DRAM, the kind in phones and laptops. The deal is worth tens of millions of dollars. In a world where Oracle spends billions on Nvidia GPUs annually, that is not a large number. But it is a start, and in the inference chip business, a start is what nobody else has managed to get.
"We are using organic substrate plus DRAM as attached memory to the chip," Agrawal told EE Times. "We are using LPDDR memory, which is commodity memory. Allocation is there."
The contrast with the standard approach is the entire competitive story. The advanced memory modules everyone else runs, HBM or High Bandwidth Memory, requires CoWoS packaging that TSMC Arizona has allocated to Apple and Nvidia through at least 2027. HBM-dependent startups like Groq and Cerebras are waiting for capacity that will not free up for years. Positron sidestepped the fight by choosing a slower memory technology on paper, one that does not win benchmark comparisons on raw bandwidth but also does not require years on a waiting list for cutting-edge packaging.
The architecture bet only makes sense if the memory wall problem is real. Karl Freund, principal analyst at Cambrian-AI Research, told EE Times that existing inference chips, including those from Groq and Cerebras, hit a ceiling when models exceed roughly 500 billion parameters. "Inferencing on larger models and/or large contexts over 500 billion parameters demands a new architecture to deliver massive capacity and bandwidth," he said. Positron is not trying to beat HBM on speed. It is trying to solve the capacity problem, fitting the full model and context window in memory at all, by using LPDDR modules ganged together in quantity.
The chip making that case is the Asimov accelerator, due in mid-2027, which holds up to 2.3 terabytes of memory per chip. Positron claims over 90% memory bandwidth utilization on real Transformer workloads, versus under 30% for GPUs running the same models. The 400-watt thermal design power is deliberately targeting the existing data centers that cannot fit liquid-cooled Blackwells or Rubins into their racks. Against Nvidia's upcoming Rubin chip, Positron claims 5x tokens per dollar and 5x tokens per watt, namely future-product numbers since Asimov does not ship until 2027.
The product Oracle deployed is Positron's current Atlas accelerator. In Tom's Hardware benchmarking, Atlas delivered 280 tokens per second per user on Llama 3.1 8B at 2,000 watts, versus 182 tokens per second per user for an Nvidia DGX H200 server. That is a real result from an independent outlet. It is also a small model on current silicon, not the 500B-plus context window the Asimov architecture is designed around.
One constraint Positron's supply chain strategy could not avoid: TSMC Arizona allocation. The foundry is booked out by Apple and Nvidia through at least 2027. Positron will manufacture Asimov in Taiwan and hopes for Arizona allocation in 2028 when Apple and AMD shift to 2-nanometer processes.
Deloitte projects inference will represent roughly two-thirds of all AI compute workloads by 2026, growing to a $50 billion market. That is the prize: a market Nvidia currently holds at near-100% share, where the tools to compete are not just silicon but the supply chain to manufacture it. Whether LPDDR's commodity availability can deliver enough memory bandwidth to make the architecture work at frontier scale is the open question. Oracle is the only major customer so far. The deal is real. The chip with the boldest claims ships next year.