ROBNEWS

NVIDIA Found the Scaling Law for Robot Fingers

reported by Samantha · 4 min read · published April 17, 2026

PREVIEWNVIDIA Found the Scaling Law for Robot Fingers · MD

NVIDIA dropped something real on Hugging Face last week. The company released GR00T N1.7, and the claim is not another benchmark victory. It is a scientific result: NVIDIA says it has found the scaling law for robot fingers.

The paper, EgoScale (arXiv 2602.16710, February 2026), describes a log-linear relationship between how much human egocentric video a robot model trains on and how well its hands work. The correlation coefficient is R²=0.9983. That is not a marketing number. It is a fit line drawn through real data, and the researchers went further — they showed that the validation loss they measured predicts actual performance on a physical robot. More video of humans doing skilled hand work means better robot dexterity, consistently, across hardware platforms. The 54% improvement in average task success rate over a no-pretraining baseline was measured on a 22-degree-of-freedom dexterous robotic hand.

The training set is 20,854 hours of egocentric human video spanning 20 task categories, from factory floors to retail to healthcare to homes. The model is a 3 billion parameter Vision-Language-Action architecture using an Action Cascade dual-system design: a Cosmos-Reason2-2B vision-language model handles high-level reasoning while a 32-layer Diffusion Transformer produces low-level motor commands. GR00T N1.7 is Apache 2.0 licensed and validated on Unitree G1, Bimanual Manipulator YAM, and AGIBOT Genie 1. For factories already running N1.6, it is a a drop-in swap.

The authors include researchers from NVIDIA, UC Berkeley, and the University of Maryland. Trevor Darrell and Yuke Zhu are among the names that will be familiar to anyone following Berkeley's robotics work. The paper was submitted to arXiv on February 18.

The scaling law framing is what separates this from the parade of "robotics GPT moment" announcements that wash through my inbox every few weeks. This paper makes a falsifiable claim: if you add more human video of a task, the robot gets better at that task, predictably. You do not have to take NVIDIA's word for it. You can look at the data. The correlation between training hours and validation loss is not a proprietary secret — it is plotted in Figure 2.

What the paper does not specify is where 20,854 hours of human hands doing skilled labor actually came from. Humanoids Daily reported that factory worker videos were part of the dataset. The paper names task categories including manufacturing. It does not say whether the people who generated that footage were asked, compensated, or aware. I have asked NVIDIA for clarification on the dataset composition and consent process and will update if I hear back.

This matters for the claim's reach. If the scaling law holds only because NVIDIA had access to proprietary factory footage that competitors cannot replicate, the "scaling law" framing starts to look like a moat construction exercise dressed up as a scientific result. If it holds on publicly available video plus licensed data, it is what the paper says it is: a method that works.

The factory floor angle is where the story gets physical. GR00T N1.7 is positioned for production deployments, not lab demos. The Action Cascade architecture is explicitly designed for contact-rich assembly tasks — the kind of fine motor work that has kept human hands employed in logistics and manufacturing even as larger-scale automation ate away at other job categories. NVIDIA is not being subtle about the target market.

The 54% success rate improvement sounds dramatic until you ask: improvement over what baseline, measured how, on which tasks? The paper says it is improvement over no pretraining at all, using a 22-DoF hand on a set of dexterous manipulation tasks. That is a legitimate comparison, but it is also the comparison that makes the number look largest. A robot that has never seen human video versus one that has seen 20,000 hours of it is not the same as a robot that has seen 5,000 hours versus one that has seen 20,000. The scaling curve is log-linear, which means the gains from the first 1,000 hours are larger than the gains from the last 10,000.

None of this negates the core result. A scaling law for dexterity is a real thing, it appears to hold across different robot embodiments, and NVIDIA has released the weights under a license that lets companies actually use it. That is more than most robotics research papers attempt.

The question worth sitting with is who controls the footage. If more human video data predictably produces better robot hands, then the bottleneck is not actuator design or tactile sensors or any hardware problem the robotics community has been grinding on for twenty years. The bottleneck is who has filmed people working, and on what terms. NVIDIA just told the industry where to look.

I am checking on the consent and compensation question. That is the thread that runs from this paper to the factory floor where someone was filmed holding a component or folding a garment. Until I know whether that person was asked, the scaling law has a gap in it.

NVIDIA Found the Scaling Law for Robot Fingers

Sources