Tensordyne bets it can make AI chips skip multiplication

Tensordyne bets it can make AI chips skip multiplication — type0 | type0

PREVIEWTensordyne bets it can make AI chips skip multiplication · MD

The AI chip industry is built on multiplication. Every modern GPU devotes the bulk of its silicon area to fast multiply-accumulate circuits, the workhorse operations that turn billions of matrix numbers into the next layer of a neural network. A small startup called Tensordyne is betting that multiplication is a habit the industry can break.

The company's first commercial chip, Napier, is an AI accelerator now being fabricated on TSMC's 3nm process. Instead of multiplying, Napier converts numbers into logarithms and lets the chip add them instead. Multiplication is just addition in disguise once you write numbers on a log scale: 1,000 times 100 equals 100,000 the slow way, but log 1,000 plus log 100 equals log 100,000, and the antilog turns that back into the answer. The trick has been used in numerical methods for decades. Tensordyne's claim is that it can be done at AI-chip scale without giving up too much accuracy.

That last point is the load-bearing one. Log-domain math is fast and energy-efficient, but it is not exact. The fast way to estimate a logarithm in hardware is the Mitchell approximation, a 1962 method that approximates log values using a shift-and-add trick. It is lossy by construction. Tensordyne pairs the approximation with a section-wise correction that the company says recovers FP16-equivalent accuracy inside the multiply-accumulate unit. The chip also supports native FP8 and 4-bit block floating point formats for less demanding layers. The mechanism is real, and the 3nm tape-out (a design completed and sent for fabrication) is a concrete milestone. What is not real, yet, is the headline number.

The headline number is 17x. Tensordyne says Napier can deliver up to 17 times more tokens per watt and roughly 13 times the throughput of Nvidia's current Blackwell generation. The figure comes from a cofounder interview in The Register, not from an independent MLPerf-style benchmark. Tensordyne has taped out Napier and started fabrication, but the chip does not ship in volume until the second or third quarter of 2027. The Register explicitly cautions against reading those numbers as settled fact, and any reader evaluating them should do the same.

The per-chip specs are concrete. Napier draws 300 watts, carries 144GB of HBM3e memory across four stacks, and pushes 4.7 TB/s of memory bandwidth. Peak dense FP8 compute lands at 2.1 PFLOPS, roughly H200-class throughput at about 60 percent of the power. The system Tensordyne is selling around it, called TDN72, packs eight air-cooled blades with a 10-core Intel Xeon-D host per blade and nine Napier chips per pod, for 72 accelerators per rack. A 52U rack can hold up to four TDN72 pods, totaling 608 PFLOPS at 120 kilowatts. The Register reports that comes out to about 1.68 times the dense FP8 throughput per rack as Nvidia's GB200 NVL72. Per-pod power sits at 30 kilowatts, air-cooled, which lets the system slot into existing brownfield datacenters without liquid cooling.

That brownfield angle is the most defensible part of the pitch. A 30-kilowatt air-cooled pod that fits in a room built for older hardware is a niche Nvidia does not target with its higher-density racks. Neoclouds Cirrascale and BlueSky Compute are named as interested customers. The compiler can convert existing models, and the software stack already supports the vLLM serving platform alongside a proprietary offering. PyTorch support is still under development.

Two structural risks sit underneath the marketing. First, Napier supports FP4 weights but not Nvidia's NVFP4 format, the precision standard the rest of the industry is consolidating around. Second, by the time Napier ships in 2027, Nvidia will be selling Vera Rubin and Vera Rubin Ultra, not Blackwell. Beating the current generation is one thing. Beating the next two is a different fight.

A useful question to keep for the next "Nvidia-killer" headline is the same one that applies here: what is the chip, what is the mechanism, and what is independently verified? Tensordyne has answered the first two. The third is still open.

Tensordyne bets it can make AI chips skip multiplication

Sources