A startup called Etched reported $1 billion in chip sales at a $5 billion valuation on June 30, 2026. The wire read is "Nvidia competitor raises money." The mechanism underneath is more interesting, because transformer-specific silicon only makes sense if a specific bet about how AI workloads are evolving turns out to be right.
Transformers are the architecture behind today's large language models: a pattern of math operations (matrix multiplications layered with attention) that turned out to scale surprisingly well when fed enough text. The same handful of operations show up across nearly every modern model, from OpenAI's GPT line to Anthropic's Claude and Google's Gemini. A general-purpose GPU like Nvidia's H100 is flexible precisely because it can run almost any kind of math. The cost of that flexibility is silicon area, power, and memory bandwidth spent on patterns the workload does not actually need. Etched's bet is that the transformer pattern is stable enough to justify designing whole server racks around one architecture, treating an entire rack as a single computational unit rather than assembling it from general-purpose parts.
That is what an application-specific integrated circuit, or ASIC, is supposed to buy you: silicon engineered for one job, running faster and cheaper than a generalist chip. Etched's Sohu product is a transformer-only ASIC paired with cluster-scale memory and tuned for low-voltage inference, the operation of running an already-trained model to produce answers rather than training it from scratch. Manufacturing runs through TSMC partnerships, and the company has raised $800 million from Jane Street and a fund linked to TSMC. Per Etched's own materials, the pitch is financial performance per transformer workload, not flexibility across workloads.
Today, hyperscalers and model labs size their inference budgets in terms of GPU seats: how many H100s or B100s they need to handle peak request load. If Sohu works at the rack scale Etched claims, the accounting changes. A purpose-built transformer chip can plausibly deliver more useful inference per dollar, per watt, and per square foot of data center. That shifts negotiating leverage on the Nvidia margin, the way hyperscalers fund data center buildouts, and which suppliers get locked into multi-year capacity contracts.
The single sharpest claim and the most lightly sourced is the $1 billion chip sales figure. It rests on TechCrunch and TheNextWeb reporting from June 30, 2026, without a company filing, investor letter, or audited statement behind it. Concentrating that $1 billion in a handful of early adopters also matters, particularly if those customers run batch inference, cheap asynchronous work that can tolerate higher latency. A transformer ASIC optimized for batch jobs would not reprice the broader inference market in the way the bull case requires. Real-time production serving is what hyperscalers actually deploy at scale, and that is the workload the thesis has to win.
There is also a prior-art shadow. Google's TPU is the closest analogue: a transformer-era accelerator developed in-house and used at scale inside one company's own data centers. TPUs have not collapsed Nvidia's position in the broader market, in part because owning the chip and writing the model that runs on it lets Google internalize trade-offs an outside vendor cannot. Etched's bet is that an outside vendor can reach the same efficiency at competitive cost, that a chip not written in lockstep with a single model can still eat the inference market.
Watch for independent deployment disclosures from non-early-adopter customers. Watch for hyperscaler commentary on inference cost per request. Watch for any audited figure behind the $1 billion sales number, or an SEC filing that confirms it. Until those land, both the transformative picture and the falsifying one fit the available evidence.