The corpus that trained today's most capable language models is roughly what one human would produce across 100,000 years of reading. The corpus available to train robot manipulation systems, the data that lets a machine grip a cup, sort a bin, or palletize a mixed case, amounts to a few years of equivalent reading. Ken Goldberg, a UC Berkeley professor, put a name on the difference at ICRA 2026, the IEEE's flagship robotics conference: what he called a "100,000-year data gap" between the reading that trains language models and the reading that trains robots. That gap, he argued, is why the embodied-AI field, the branch of AI teaching machines to act in the physical world, cannot simply scale its way to general-purpose robots the way large language models scaled to chatbots.
His proposed answer is hybrid, and it runs on two pieces. The first is "Code-as-Policy," the idea that a large language model should not directly emit motor commands for a robot arm. It should write inspectable, verifiable code describing the task as a computational graph that other agents and downstream systems can read, test, and repair. The second is "GOFE," a foundation-model layer for embodied tasks that sits above the code-writing model and coordinates multiple agents as they co-construct and optimize that graph. The combination, Goldberg argued, gives embodied-AI teams a third option beyond the binary of "more data" or "more hand-engineering": a hybrid stack that draws on language models for what they do well and classical methods for what they do reliably.
The argument is not that data is useless. Goldberg's own Dex-Net project, a roughly decade-old effort to train neural networks on large simulated grasp datasets and then deploy them on real robots, is part of why he speaks with confidence. Dex-Net worked because it combined a large synthetic dataset with classical engineering, including a probabilistic model of grasp success that scored candidate grasps before the robot attempted them. He treats that decade-old result as evidence that the data-plus-engineering hybrid is older than the current scaling debate, and the same logic, he suggested, applies now: pure scaling is not a substitute for structure.
What is broken, in Goldberg's framing, is the vision-language-action model, or VLA, the class of embodied-AI system that ingests camera images and natural-language instructions and outputs robot motion directly. VLAs generalize impressively in controlled demos and break the moment a warehouse floor throws a small environmental change at them: a new lighting angle, a slightly shifted tote, a piece of tape on a bin. Classical engineered systems, the kind that industrial automation has shipped for decades, do not generalize but are reliable. Goldberg frames the two as complementary, with Code-as-Policy as the bridge that lets a language model contribute the parts it is good at, planning and code synthesis, without inheriting the parts it is bad at, low-level real-time motor control in unstructured environments.
The bridge is not just a slide. The research substrate behind the talk is already public: an arXiv preprint titled "CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation" and a companion GitHub repository for the capgym/cap-x codebase. CaP-X is a benchmark, not a finished product, and Goldberg used the plenary to argue that the embodied-AI community should treat coding agents the way the language-model community treated chain-of-thought prompting a few years ago, as a measurable, improvable primitive rather than a parlor trick.
There is industrial context, too, though it is supporting rather than central. Goldberg is a co-founder of Jacobi Robotics, a Berkeley spinout that has used classical motion-planning software in industrial deployments, and the company has spent the spring of 2026 announcing partnerships with ABB Robotics, FORTNA, and Peak Technologies. The deals are a reminder that the "engineering still matters" half of Goldberg's argument is not a thought experiment. It is the part of the stack that is already moving pallets in real warehouses.
The caveats are worth naming. The "100,000-year" figure is a rhetorical comparison, not a measurement of robot-data corpus years. The CaP-X benchmark is a preprint, not peer-reviewed work, and its numbers should be treated accordingly. The Goldberg framing of the data gap is a single voice on a conference stage, and the original talk was summarized for English-language readers by a Chinese-language conference report from Leiphone; the talk recording and slides, once posted, will be the primary source.
What to watch next is concrete. If Goldberg and his collaborators use CaP-X to publish hard, reproducible deltas showing that Code-as-Policy pipelines recover most of the brittleness losses of VLAs on real warehouse tasks, the hybrid stack moves from keynote argument to engineering roadmap. If the benchmark instead becomes another tally in the language-model-agent literature, the embodied-AI field will have heard a strong critique without a follow-through, and the 100,000-year gap will still be waiting.