New RL Method Turns Robot Pretraining Into Real-World Manipulation Skills
Researchers at Columbia University have developed a new approach that takes pretrained robot policies and turns them into reliable manipulation skills using reinforcement learning.

Researchers at Columbia University have developed a new approach that takes pretrained robot policies and turns them into reliable manipulation skills using reinforcement learning. The method, called DICE-RL, achieved success rates above 90% on three challenging real-world tasks, according to a paper submitted to arXiv on March 10, 2026.
The approach addresses a persistent problem in robotics: pretrained models often capture broad behavioral patterns but struggle with specific, real-world tasks. DICE-RL uses RL as what researchers describe a "distribution contraction" operator—essentially amplifying the most successful behaviors from a pretrained policy while maintaining the diversity that makes the original model flexible.
The Results
On a real robot performing belt assembly, success rates jumped from 56.67% to 93.33% over 30 trials. Light bulb insertion improved from 56.67% to 90% success, and gear insertion rose from 46.67% to 90%.
Notably, the system works directly from high-dimensional pixel inputs, eliminating the need for explicit state estimation—a technical hurdle that often limits deployed robots to controlled environments.
"We pretrain a diffusion- or flow-based policy for broad behavioral coverage, then finetune it with a stable, sample-efficient residual off-policy RL framework," explained Zhanyi Sun, the paper's lead author, along with Shuran Song, Assistant Professor at Columbia.
Why It Matters
The gap between simulation and real-world robot performance has long frustrated the field. DICE-RL's claimed stability and sample efficiency could make it easier for robots to adapt to new tasks without extensive retraining or specialized infrastructure.
The work builds on the growing trend of using diffusion models for robot control, but adds a concrete finetuning mechanism that the researchers say combines selective behavior regularization with value-guided action selection.
The paper is available on arXiv (2603.10263) and the project website includes videos demonstrating the real-robot experiments.
This article synthesizes the arXiv submission by Zhanyi Sun and Shuran Song with verification against the original paper and project website. Success rate data comes directly from the authors' reported real-robot experiments.
Sources
- arxiv.org— arXiv:2603.10263
- zhanyisun.github.io— Project website
- cs.columbia.edu— Shuran Song faculty page
Share
Related Articles
Stay in the loop
Get the best frontier systems analysis delivered weekly. No spam, no fluff.
