NVIDIA Says RTX Spark Delivers 1 Petaflop. It Won't Say at What Precision.
NVIDIA's new RTX Spark chip for Windows laptops makes a simple claim: 1 petaflop of AI compute. What it has not disclosed is which of the industry's standard precision formats produced that number.
Flops measure raw computational speed. Precision formats determine both the speed and the accuracy of the result. FP32 is full precision, the format your laptop CPU uses for ordinary calculations. FP16 cuts the field in half, FP8 halves it again, and FP4 halves it a third time. Smaller formats run faster but accumulate rounding errors differently depending on the workload. A chip advertising 1 petaflop at FP4 will outperform a chip advertising 1 petaflop at FP16 on certain AI inference tasks while being useless on others. Knowing which flops you are counting is the difference between a spec sheet and a slogan. And that is exactly what NVIDIA has not told buyers about RTX Spark.
Jensen Huang unveiled RTX Spark at GTC Taipei on May 31, 2026, alongside Microsoft and OEM partners ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI, with systems available this fall. The chip combines a 20-core Arm-based Grace CPU, co-designed with MediaTek, with a Blackwell GPU sporting 6,144 CUDA cores and fifth-generation Tensor Cores. It ships with up to 128GB of unified memory and can run 120-billion-parameter language models with a 1 million token context window locally. That currently requires cloud infrastructure or a dedicated workstation.
The market read it as a credible threat. Intel fell 6 percent, AMD slipped 5 percent, while NVIDIA gained 4 percent and Microsoft climbed 3 percent. The trade was not subtle.
What makes this more than a hardware refresh is the software layer underneath. OpenClaw and Nous Research's Hermes Agent, which crossed 140,000 GitHub stars in under three months and became the most-used agent on OpenRouter, are both adopting the new Windows security primitives and NVIDIA's OpenShell runtime. OpenShell gives users explicit policy control over what agents can do, the ability to route queries to local or cloud models based on privacy preferences, and the option to disguise personal information before sending anything to a remote endpoint. For a market that has spent the last two years debating whether AI agents should have access to your files, your calendar, and your applications, this is the infrastructure question made concrete.
The precision question is not nitpicking. AI inference performance claims at different precisions can vary by an order of magnitude on the same hardware. NVIDIA's own documentation for the Blackwell architecture shows FP4 Tensor Core throughput significantly exceeding FP8 and FP16 on matrix operations typical of transformer inference. If RTX Spark's 1-petaflop headline number is an FP4 figure, it tells you something very different about sustained real-world AI performance than if it is FP16. Other Blackwell-family chips have disclosed FP4 support for their Tensor Cores. The RTX Spark announcement did not specify which precision format produces the 1-petaflop figure. The comparison problem is real: Apple publishes its Neural Engine performance at both INT8 and FP16, AMD's Strix Point NPU discloses 50 TOPS at Block FP16, and Intel's Lunar Lake NPU specifies 48 TOPS. All three disclose their precision targets. NVIDIA has not done so for RTX Spark.
Microsoft's Azure infrastructure adds context that cuts both ways. The company announced it is deploying hundreds of thousands of Blackwell GPUs across its AI data centers using GB200 NVL72 rack-scale systems, with Azure ND GB200 v6 virtual machines delivering what it claims is 35 times the inference throughput of the previous H100-based generation. This is a cloud-versus-edge story that the RTX Spark announcement is deliberately blurring. The same NVIDIA architecture that runs your data center runs your laptop. Whether that consistency translates to equivalent performance at the edge, where memory bandwidth, thermal constraints, and power envelope all diverge from datacenter norms, is an open question.
The Arm question is the longer fuse. Windows on Arm has been attempted before. Windows RT in 2012 was a documented failure that shipped without legacy x86 application support and died within eighteen months. The emulation story has improved since then, but x86 emulation on Arm-based Windows has historically introduced meaningful performance penalties for computationally intensive workloads. NVIDIA and Microsoft have not published emulation benchmark data for RTX Spark. For existing Windows applications built for x86, the compatibility and performance of that emulation layer will determine whether RTX Spark is a workstation replacement or a workstation plus a separate translation layer. Until the benchmarks exist, the 1-petaflop figure is the only number on the table, and without its precision target, it is a number without a unit.