PREVIEWThe Sim2Real Trap: Why Chasing Reality Is Starving Robot Policy Learning · MD

The Sim2Real Trap: Why Chasing Reality Is Starving Robot Policy Learning

For a decade, sim-to-real transfer has been the robotics field's loudest success story. Train a policy in simulation, close the visual and physical gap to the real robot, ship it. The results are real: locomotion policies, manipulation skills, and dexterous hand control that would have been infeasible to learn directly on hardware just a few years ago. By almost any visible benchmark, the paradigm is winning.

A new position paper on arXiv, titled "Too Much of a Good Thing: When sim2real Efforts Impede Policy Learning (And What to Do About It)" and submitted by Kyle Morgenstein, Bharath Masetty, and Stephen Welch of Apptronik, and Luis Sentis of the University of Texas at Austin, argues that the visible wins may be hiding a quieter failure — one rooted in the field's incentive structure rather than in any specific algorithm. The authors do not call for abandoning sim2real. The argument is sharper than that. It is that the field's success at closing the sim-to-real gap has produced a kind of simulator lock-in — a state in which the simulator's design choices end up constraining what the policy can ever learn.

The diagnosis: too much of a good thing

The paper's framing is deliberately self-undermining. Sim2real transfer is, the authors concede, genuinely necessary: real robots are slow, expensive, and unsafe to thrash through millions of trials. Photorealistic simulation, accurate contact dynamics, and careful domain randomization are the tools that made modern robot learning possible. None of that is in dispute.

What is in dispute is what the field has traded away to get it. The authors' central claim is that the same real-world constraints that justify sim2real in the first place — bounded robot lifetimes, careful hardware budgets, the cost of catastrophic failure — have bled into the simulator's design choices. A simulator that must produce physically credible rollouts at scale ends up inheriting the world's constraints, even though the world itself is the thing the policy is supposed to learn about.

The result, in the paper's terms, is starved policy exploration. The space the policy can meaningfully probe is narrowed not by what the robot can do, but by what the simulator was built to approximate. The exploration that would have led to genuinely new behaviors is filtered out long before any real hardware is ever touched. The simulator locks in a conservative slice of the world, and the policy learns to live inside it.

This is the structural complaint. It is not that sim2real fails at the benchmark it was designed for — the field's published results suggest it does not. It is that the incentives surrounding sim2real work have rewarded closing the gap faster than they have rewarded asking what the gap is closing over. Photorealism, contact fidelity, and visual domain randomization are easier to measure than exploration breadth, so the field has optimized for them.

The proposal: sim2sim2real, with the robot as the only constraint

The paper's constructive move is to introduce a paradigm it calls sim2sim2real. The idea is a clean inversion of the current default. Rather than designing the simulator to approximate the world and then transferring the resulting policy, the simulator is designed around the robot's own kinematics — what the hardware can actually do, expressed in its own coordinate frame.

Photorealism, scene complexity, and contact accuracy are deliberately stripped out of the training-time simulator. The only constraint carried into training is the one that physically cannot be removed: the robot's body. Policy learning happens in a simulator that does not pretend to be the world. The world is reintroduced only at the very end, when a policy learned against the robot's kinematics is transferred to the real hardware.

The bet is that decoupling policy learning from the world's constraints — while still preserving the final transfer step — recovers the exploration breadth that current sim2real practice has been quietly erasing. The simulator stops trying to be a small version of reality and becomes, instead, a faithful description of the robot inside any reality.

A paradigm proposal, not a benchmark result

It is important to be precise about what this paper is and is not. The abstract describes it as a "diagnosis and explanation" plus a "potential solution." The paper was posted as v1 on 30 May 2026 and revised as v2 on 3 June 2026, and is categorized in cs.RO / cs.AI as a preprint that has not yet been peer-reviewed. Its four authors — Kyle Morgenstein, Bharath Masetty, and Stephen Welch of Apptronik, and Luis Sentis of the University of Texas at Austin — frame sim2sim2real as a hypothesis the field is invited to test, not a method with benchmark validation.

That matters for how the proposal should be read. Sim2sim2real is a paradigm being argued for, not a method being benchmarked. Whether stripping the simulator down to robot kinematics recovers the exploration that the authors claim has been lost is, at this point, a hypothesis. The paper has not validated it with benchmark-quality experimental results, ablations, or standalone real-robot demonstrations — though a case study in Section IV describes the approach working on an Apptronik Apollo humanoid robot, with policy training in IsaacLab and zero-shot transfer to MuJoCo before hardware deployment. The constructive core of the paper is real: it names a specific failure mode (simulator lock-in) with a specific mechanism (inherited real-world constraints narrowing exploration), and it pairs the diagnosis with a concrete counter-proposal rather than a complaint.

Why the diagnosis is the news

The arXiv submission is four pages. Its lasting contribution is unlikely to be a specific architecture or a state-of-the-art result. Its contribution is the reframing: sim2real has not failed on its own terms; it has succeeded narrowly, and the narrowness is the problem. The field has been rewarded for transfer fidelity rather than for the breadth of behavior transfer enables, and that is a governance question as much as a technical one.

For a robotics community that has spent the last several years measuring itself on closing the sim-to-real gap, that is an uncomfortable question to sit with. It is also, by the authors' own framing, the right one. Sim2real is still necessary. The argument is that it has been allowed to become sufficient, and that the cost of that sufficiency has been paid in policies that never had the chance to explore widely enough to learn anything the simulator was not already designed to permit.

The sim2sim2real proposal is the bet the paper is asking the field to make. Whether the bet pays off is the next thing the literature will have to say.

The Sim2Real Trap: Why Chasing Reality Is Starving Robot Policy Learning — type0 | type0

The Sim2Real Trap: Why Chasing Reality Is Starving Robot Policy Learning

The Sim2Real Trap: Why Chasing Reality Is Starving Robot Policy Learning

The diagnosis: too much of a good thing

The proposal: sim2sim2real, with the robot as the only constraint

A paradigm proposal, not a benchmark result

Why the diagnosis is the news

Sources