The $3 Billion Bet on Robot Data That Barely Exists
The robotics industry is about to spend $3 billion on a data problem it cannot yet solve.
That is the real bet embedded in every humanoid robot funding round, every foundation model launch, and every "fleet learning" pitch circulating through venture portfolios right now. Builders estimate they will pour more than $3 billion into robot data over the next two years, according to Bessemer Venture Partners' robotics thesis — chasing a supply of training footage that currently totals roughly 300,000 hours worldwide. To put that in perspective: roughly 1 billion hours of video get uploaded to YouTube every year, and frontier language models were trained on 300 trillion tokens of text. Robot manipulation footage is not rare in the way rare earth metals are rare, meaning physically constrained. It is rare in the sense that nobody has systematically recorded it at scale, with the labels that make it useful for training a machine to imitate human motion.
The reason the money is following anyway comes down to a relationship researchers have now quantified with unusual precision. NVIDIA's EgoScale paper, published in February, showed that robot performance improves in a predictable, near-linear way as you feed the system more hours of human egocentric video — the kind of footage that captures a person's hands doing a task from their own point of view. The correlation between data scale and task success is so clean it reads like a physics constant: an R² of 0.9983, with a 54% improvement over a no-pretraining baseline on a 22-degree-of-freedom dexterous hand. That is the scaling law in plain terms: more varied examples of human motion produce more reliable imitation learning, following a curve you can actually predict. It is the same pattern that made large language models behave more reliably as compute and data increased — now shown for the physical domain.
The scaling law is real and it is the reason $3 billion is flowing toward a dataset that barely exists. When improvement is predictable, capital follows. PitchBook recorded $27.6 billion across 1,009 robotics deals in 2025, with industrial robotics up 70 percent year over year, according to AI to ROI. Goldman Sachs projects the humanoid robot market will reach $38 billion by 2035, with annual shipments exceeding 1.4 million units. Every investor who has been burned backing a robotics company with a great demo and no data strategy is now demanding to see the footage pipeline before writing a check.
This is why fleet learning has become the phrase that should be in every robotics investor's pitch template. A robot that performs a task in one facility and learns from that performance, then shares that knowledge across every other robot in the fleet, is a data collection engine as much as it is a machine. Every deployment becomes a training run. Every edge case encountered in the field gets labeled and fed back into the next model iteration. The company that figures out how to do this reliably at commercial scale is not just building robots. It is building the infrastructure layer of the entire industry.
Physical Intelligence, the Alphabet-backed lab behind the π0 model, has said as much plainly. Figure, 1X, and Apptronik have said it less plainly but signaled it in how they describe their data strategy. The humanoid robot companies that look most defensible are the ones that have thought hardest about where their training data comes from in year three, not just year one.
The synthesis angle — using physics simulators to generate synthetic robot manipulation data — is both promising and oversold. NVIDIA's own Cosmos platform is built partly around this idea, as is Sim-to-real transfer, a legitimate research direction. But the 54% improvement EgoScale showed over a no-pretraining baseline was demonstrated on a specific embodiment doing contact-heavy tasks in controlled conditions. Whether that scaling curve holds across different robot bodies, different task distributions, and real factory floors is still an open empirical question. The law is clean in the lab. The lab is not the warehouse.
There is also the harder problem the papers do not address: who is in the footage. NVIDIA's EgoScale acknowledgments thank the robot operators by name, a gesture toward transparency that is not the same as consent or compensation. The 20,854 hours of human hands doing skilled labor in the pretraining set came from somewhere. The workers whose motion data was captured were not, according to the paper as published, offered a revenue share or a release form. That is not a criticism of NVIDIA specifically. It is a structural feature of a field that is about to spend $3 billion building systems trained on human motion, with no established norm for where that motion comes from or who gets paid for it.
None of this changes the direction of travel. The scaling law is real. The investment is real. The labor market pressures driving demand for automation are demographic and structural, not cyclical. Robots will get better, and the companies that control the data infrastructure feeding that improvement — the pipelines, the labeling systems, the fleet learning loops — will capture a disproportionate share of the value.
For builders and investors, the implication is straightforward if unglamorous: the team that can reliably collect, label, and pipeline robot manipulation data at scale is worth more than the team that has the best architecture paper. The moat is not the model. The moat is the footage.