OpenAI has built its first chip designed from scratch to run, rather than train, its AI models. The chip, called Jalapeño, was co-developed with Broadcom and unveiled on June 24, 2026, and it is best read as a declaration that the operating leverage in frontier AI has moved from training to inference.
Inference is the step where a trained model actually answers a user, and that is where OpenAI's bills now live. ChatGPT, the API, and a growing fleet of coding agents serve more than 800 million people a week, according to the company's October 2025 strategic collaboration announcement, and each request is a small unit of compute that, in aggregate, dominates the company's cost structure. Jalapeño is a chip built for that workload alone.
The technical choices follow from the workload. The chip was designed as a clean-sheet inference processor rather than an adaptation of an earlier AI accelerator, and it is paired with Broadcom's Tomahawk networking silicon so that thousands of chips can act as one. Celestica is co-developing the board and rack integration. Engineering samples are already running OpenAI's own models, including a variant called GPT-5.3-Codex-Spark, at the target frequency and power the company plans to ship, per the company's Jalapeño announcement.
The design cycle was unusually short. OpenAI and Broadcom took the chip from concept to tape-out in nine months, which Broadcom has called the fastest ASIC development cycle ever recorded in high-performance advanced semiconductors. Parts of the design and optimization pipeline were assisted by OpenAI's own models, a small detail that points to the feedback loop OpenAI is trying to build between its research and its infrastructure.
The cost story is the one that will draw most scrutiny, and it is also the one that is least settled. OpenAI has said only that early testing shows performance-per-watt, or how much useful work a chip does per unit of power, is "substantially better than current state-of-the-art". That is a company claim about a chip that has not yet shipped, with no independent benchmark to test it against. The unit-economics case for custom silicon should be held as a working hypothesis rather than a result.
Deployment plans match the scale of the ambition. OpenAI has said the chip will roll out at gigawatt scale with Microsoft and other partners beginning in 2026, with a multi-generation roadmap. The earlier October 2025 announcement committed 10 gigawatts of custom AI accelerators and Broadcom's Ethernet networking across racks. This is a multi-year buildout, not a swap of one supplier for another.
The competitive frame is also narrower than the headlines suggest. Jalapeño does not displace Nvidia. Pre-training of large models, the most compute-intensive step, will still run largely on Nvidia hardware. What changes is the workload split: training stays where it is, and a growing share of the day-to-day cost of running AI moves onto chips OpenAI controls. That is a complement to Nvidia's business rather than a replacement of it, a framing consistent with Reuters' earlier coverage of the custom-chip program.
OpenAI is not the first frontier AI lab to reach this conclusion. Google has spent more than a decade building its Tensor Processing Units, the chips that run Google Search, Gemini, and a growing share of third-party AI workloads on its cloud. Amazon has built Trainium for training and Inferentia for inference to serve AWS customers. The pattern is now the default for any company that runs AI at scale, and the question for the next few years is no longer whether a serious AI lab will own part of its silicon, but how much of the stack it will own, and how fast.
The open questions are whether independent benchmarks will confirm OpenAI's performance-per-watt claims once the chip reaches production silicon rather than engineering samples, how quickly subsequent generations arrive on the multi-generation roadmap, what terms Microsoft and other partners agree to as the 10-gigawatt plan becomes real orders, and whether other frontier labs accelerate their own custom-silicon programs in response.