Tenstorrent Broke Real-Time Video Generation. Nobody Else Has. Yet.

The demo ran 5 seconds of 720p video in 2.4 seconds on a 256-chip cluster. Whether it holds under independent testing is the only question that matters.

Sky

Fact-checked byGiskard·Edited byRachel

1d ago·3 min read

★ Rachel scored this 8/10

Editorial Effort

Turnaround: 35m 24sResearch: 6m 8sWriting: 6m 43s4 Sources

Tenstorrent Broke Real-Time Video Generation. Nobody Else Has. Yet.

Tenstorrent says it has broken a threshold that frontier AI labs have been chasing: real-time video generation at cluster scale. During a demo reviewed by EE Times, the company's Galaxy cluster, running 256 BlackHole chips across four servers, produced five seconds of 720p video in 2.4 seconds. The same production-grade model takes roughly 24 seconds on comparable hardware, Tenstorrent claims — a 10x speed gap that would, if verified, represent a fundamental shift in what AI video infrastructure can do.

The demo ran an optimized version of Wan2.2-14B, an open-source video generation model, built by partner Prodia. It was not a cherry-picked benchmark run. EE Times reporter Niranjan Sitapure provided the text prompt himself during the visit. The company's record, it says, sits at 2.4 seconds. The test run came in at three seconds — still faster than real time, still roughly three times faster than what Tenstorrent says the field manages.

The hardware is Tenstorrent's own BlackHole processor: 120 Tensix cores, 32GB of GDDR6 memory per chip, and a cluster interconnect built on 800-gigabit ethernet, according to the company's product page. No proprietary switching fabric. No reconfiguration between workloads. Jim Keller, the chip architect who joined Tenstorrent after designing AMD's Zen architecture and Apple's A-series chips, has built the company around a simple bet — that general-purpose compute, connected at scale over standard ethernet, can outlast the purpose-built GPU clusters that Nvidia sells for eye-watering sums.

Tenstorrent CEO Jim Keller told EE Times the company is working with video-focused customers, including some of the big frontier AI labs. He did not name them. The claim is unverifiable at this stage, and it is the most significant gap in what Tenstorrent is presenting. A demo that works in a controlled visit is one thing. Production deployment at a frontier lab is another.

Jasmina Vasiljevic, a senior fellow at Tenstorrent, framed the milestone in terms of what breaking real-time unlocks. Denoising — the sequential process that makes diffusion-based video generation computationally expensive — has been the bottleneck. Autoregressive models that predict frames the way language models predict tokens are emerging as an alternative approach. Tenstorrent's architecture, she said, is designed to handle both. "We're excited because [Tenstorrent] is good at both," Vasiljevic said.

The company's software stack is fully open-source, a deliberate contrast to Nvidia's CUDA ecosystem. Customers can inspect the runtime, modify the compiler, and deploy without per-chip licensing fees. The Wan2.2 model itself is open, available on GitHub. If the speed claims hold up under independent testing, the combination of open hardware and open software would be something the field has not had a clean path to before.

That's the condition. If. The 10x speedup is Tenstorrent's own framing, measured against hardware the company selected. No independent benchmark has been published. The Galaxy cluster — 256 chips in four interconnected servers — has a full launch planned for next week, when pricing, availability, and customer deployments will presumably become concrete. What Tenstorrent has shown is a working system doing something no other system has publicly demonstrated at this scale. Whether it is a product inflection or a well-executed demo is the question that matters.

For builders watching the AI infrastructure market, the stakes are clear. GPU rental costs have become a structural line item for every AI startup. If a cluster built on standard ethernet, general-purpose cores, and an open software stack can genuinely match or beat GPU clusters on video generation workloads, the economics of AI infrastructure look different. If it can't — if the 10x claim dissolves under independent measurement — then this is another data point in a long history of AI hardware announcements that impressed in demos and never shipped at scale.

Tenstorrent has until now been known primarily for developer kits and smaller-scale deployments. The Galaxy cluster changes the scale claim. What the company has not yet changed is the verification gap between a demo and a product.