OpenAI's 'Jalapeño' chip is a first-generation bet on cheaper AI inference
The Broadcom built processor is purpose designed for serving already trained AI models, with OpenAI's product roadmap shaping the design.
The Broadcom built processor is purpose designed for serving already trained AI models, with OpenAI's product roadmap shaping the design.
OpenAI and Broadcom did not build a chip the way most hardware deals are built. They built it from the model outward, using OpenAI's own knowledge of how its large language models actually consume compute in production to shape every architectural decision. The result, an inference-focused ASIC (Application-Specific Integrated Circuit, a chip designed from scratch for one workload rather than a general-purpose processor) codenamed Jalapeño, is the first generation of a multi-year co-design program. The partnership that produced it may matter more than the silicon itself.
The announcement positions Jalapeño as purpose-built for inference, the work of running already-trained language models like those behind ChatGPT, as distinct from the much costlier training phase that builds them. Broadcom publicly framed the chip as informed by "detailed insights" from OpenAI researchers and explicitly mapped to OpenAI's own roadmap for future models and products (Broadcom investor release; OpenAI announcement). That is co-design in the strict sense: not a customer buying off-the-shelf silicon, but an AI lab handing the chip supplier a roadmap and working backward from there.
The most operationally telling number is the timeline. Roughly nine months from design start to announced production is unusually short for a from-scratch ASIC, and Ars Technica's reporting flags the speed itself as a signal about who brought what to the table (Ars Technica). Read narrowly, the timeline implies Broadcom contributed substantial reusable IP (packaging, interconnect, and underlying process know-how), while OpenAI contributed the workload specification and the pressure to ship. Read broadly, both companies are clearly treating this as iteration infrastructure rather than a one-off procurement.
That iteration framing is explicit in the announcement. Both OpenAI and Broadcom describe Jalapeño as the first generation of a longer-term program, which means the chip is best read as a direction-of-travel signal rather than a finished product (Broadcom investor release). The competitive frame is already crowded. Nvidia still dominates datacenter GPUs for both training and inference, Google's TPUs serve its own stack, Amazon's Trainium and Inferentia chips underpin AWS workloads, and Microsoft has been working on its own Maia-class accelerators. A jointly designed inference chip with a non-GPU supplier is one answer to the concentration question, not the only one available.
The skepticism that has to travel with the announcement is just as concrete. OpenAI's performance claim, "early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art," is preliminary, vendor-sourced, and explicitly framed as ongoing (OpenAI announcement). No third-party benchmark exists yet. No disclosed process node, memory hierarchy, or interconnect details are public. The promised "detailed technical report" tied to deployment context has not landed. A nine-month design cycle for a from-scratch ASIC is fast enough to warrant real scrutiny, not just applause. "First generation," in short, is the companies' own framing, and it leaves room for the next revision to look meaningfully different.
Why this matters beyond the chip: inference cost and energy use are now the dominant operating expense for AI labs serving large user bases, a problem that is no longer hypothetical for OpenAI as it operates ChatGPT at scale. A co-designed inference chip with Broadcom shifts the bargaining position of the dominant accelerator vendors even if Jalapeño itself never ships in high volume. The strategic question is whether this kind of partnership can make running large language models measurably cheaper and less concentrated. For the next several months, the more immediate question is whether the promised technical report lands with enough specificity to let outside observers answer it.