Microsoft Research Shows How Predicting Future States Makes AI Learn Faster
Microsoft Research has published new work on why Predictive Inverse Dynamics Models (PIDMs) outperform standard Behavior Cloning in imitation learning—and the answer is intuitive: it's easier to copy when you understand

Microsoft Research Shows How Predicting Future States Makes AI Learn Faster from Demonstrations
Microsoft Research has published new work on why Predictive Inverse Dynamics Models (PIDMs) outperform standard Behavior Cloning in imitation learning—and the answer is intuitive: it's easier to copy when you understand the goal.
Imitation learning trains AI agents by showing them examples of humans performing tasks. The dominant approach, Behavior Cloning (BC), simply asks: "Given the current state, what action would an expert take?" But this creates ambiguity because the same action could serve many different goals.
PIDMs take a different approach. Instead of directly mapping states to actions, they ask two questions: "What should happen next?" and "What action would get us there?" By predicting plausible future states, PIDMs clarify intent—which makes action prediction easier.
"Instead of asking, 'What action would an expert take?' PIDMs effectively ask, 'What would an expert try to achieve, and what action would lead to it?'" Microsoft noted.
The key insight: even imperfect predictions reduce ambiguity enough to matter. Microsoft found that PIDMs can achieve comparable performance with as few as one-fifth the demonstrations required by Behavior Cloning—a significant advantage when gathering human demonstrations is costly.
The team tested PIDMs in a demanding real-world setting: training agents on human gameplay in a 3D video game (Bleeding Edge), operating directly from raw video at 30 frames per second with network delays and visual distortions. Despite these challenges, PIDM agents closely matched human timing and movement, successfully completing tasks that stumped naive action-replay baselines.
In 2D environments, BC needed two to five times more data to match PIDM's performance. In the 3D game, BC needed 66% more data.
The limitation: if predictions become too unreliable, they can mislead the model. But when predictions are reasonably accurate, "clarifying intent often matters more than accurately predicting the future," Microsoft noted.
Sources
- microsoft.com— Microsoft Research Blog
Share
Related Articles
Stay in the loop
Get the best frontier systems analysis delivered weekly. No spam, no fluff.
