OpenAI's first custom AI chip took nine months to build, and that speed is the news.
On June 24, 2026, OpenAI and Broadcom unveiled Jalapeño, the lab's first processor purpose-built for running large language models rather than training them. The Broadcom investor release calls Jalapeño an "LLM-optimized intelligence processor." What makes it unusual is not that a tech company designed a chip. It is that OpenAI, a software lab with no silicon team until recently, took a design from concept to working silicon in nine months, using its own models to accelerate parts of the design and tape-out process.
That cycle time is the concrete datapoint behind a much larger shift. Custom AI silicon used to be a hyperscaler-only game, with Google running its TPU program (Tensor Processing Units, Google's line of AI accelerators), Apple replacing Intel with Apple Silicon, Amazon designing Trainium and Inferentia, Meta building MTIA, and Microsoft working on Maia. The TechCrunch Equity podcast that surfaced this beat framed the moment as everyone turning up the heat on Nvidia. The cleaner read is more specific. Workload-specific vertical integration is now the default move for any company whose AI compute bill has outgrown what off-the-shelf chips can deliver.
Jalapeño sits in that bucket. OpenAI has positioned it as a reticle-sized inference chip, meaning it fits inside a single photomask exposure used in fabrication, on 3nm and 2nm process nodes, with first deployment targeted for the second half of 2026 under the 10-gigawatt multi-year OpenAI-Broadcom collaboration signed in October 2025. The company has not disclosed Jalapeño's exact throughput, packaging, or high-bandwidth memory configuration, so specific performance comparisons should be treated as preliminary. The more consequential detail is what Broadcom said about who gets to buy the chip: the company framed Jalapeño as supporting "present and future LLMs across the industry," opening the door to third-party sale. As Tom's Hardware's custom-AI-ASIC state-of-play analysis put it, that line materially shifts the competitive framing, because it positions OpenAI as a distribution channel for inference silicon that does not come from Nvidia.
The pattern does not stop with OpenAI. SpaceX filed plans in May 2026 for a Texas chip facility it calls Terafab, a project Reuters sized at roughly $55 billion in its S-1 filing, with later SpaceX projections reaching roughly $119 billion when satellite and space-adjacent facilities are included. SpaceX's stated goal is to "manufacture our own GPUs," which means a private rocket company is now also in the chip-fab business. Google, Amazon, Meta, and Microsoft have all been running this playbook for years. The Tom's Hardware custom-AI-ASIC analysis maps the landscape: Google's TPU program, Amazon's Trainium and Inferentia, Meta's MTIA, and Microsoft's Maia now cover a tier of inference workloads that used to run exclusively on Nvidia. Apple Silicon, the iPhone maker's switch from Intel chips to its own M-series designs, is the historical analogy for what vertical integration actually buys: control over the perf-per-watt curve, not raw dominance.
The headline framing of a wholesale Nvidia displacement is too coarse. Nvidia still ships the most flexible GPU stack for training new frontier models, and it remains the default for the workloads nobody has bothered to specialize for yet. What is happening is more like a hedge that is becoming a structural feature of the supply chain. AI labs and hyperscalers are picking the specific inference jobs where they can win on cost or efficiency and pulling those in-house, while keeping Nvidia for everything else. The labs are not replacing Nvidia. They are routing around it for the parts of the workload that matter most to their own economics.
The watch item from here is cycle time. A nine-month custom chip is a one-off only if the next one takes three years. If OpenAI, Broadcom, SpaceX, and the hyperscalers keep the cadence, the inference market in 2028 will look less like a single-vendor show and more like a tiered supply chain: Nvidia for general training and unfamiliar workloads, custom silicon for the jobs each operator runs every day. The variable to watch is whether Broadcom actually starts selling inference silicon built on OpenAI's reference design to other labs. The moment that happens, the Nvidia displacement question stops being theoretical.