Sony's Table Tennis Robot Learned Everything in a Simulator — Then Beat the Pros
Ace learned entirely in simulation, then produced a table tennis shot an Olympic medalist called impossible. The knowledge flow reversed. That is the part of this story the wire did not touch.

Sony built a ping-pong robot that never touched a ball during training. On a regulation court at Sony's Tokyo headquarters, Ace — the company's table tennis robot — defeated three out of five elite human opponents under official International Table Tennis Federation rules. The achievement, published in Nature on April 22, is being called the first time a robot has reached expert-level performance in a commonly played competitive sport in the physical world. But the result that matters to robotics researchers is not the scoreboard.
Ace learned to play entirely inside a simulation. No human demonstration, no physical trial-and-error, no years of practice at a table. The robot was trained using a reinforcement learning algorithm called Soft Actor-Critic inside a custom physics model — essentially a high-fidelity video game of table tennis — then deployed to a real court. What transferred, and how well, is the part of this story that will determine whether it becomes a blueprint for the industry or remains a $50 million science project.
The latency gap
The most concrete measure of Ace's performance is not its win rate but its reaction time. Sony measured end-to-end latency — from ball detection to motor command — at 20.2 milliseconds. Elite human players, by contrast, typically require around 230 milliseconds to respond to an incoming shot. That is roughly an eleven-fold difference in the time available to make a decision.
To achieve that speed, Ace uses a hybrid vision system with nine high-resolution cameras running at 200 frames per second and three event-based sensors that track pixel-level changes in brightness at sub-millisecond resolution. The combined perception stack locates the ball in three-dimensional space with an error of 3 millimeters and a latency of 10.2 milliseconds. Ball spin, which can exceed 9,000 revolutions per minute and reach rotational velocities of 1,000 radians per second, is tracked by estimating the rotation of the ball's surface logo between frames. The robot's arm has eight degrees of freedom: six revolute joints and two prismatic ones, allowing it to adjust stroke angle and reach within a compact workspace.
The control policy itself is sampled at 31.25 hertz — roughly 32 milliseconds between decisions — and each decision is mapped to a 32-millisecond trajectory segment sampled at 1 kilohertz. The entire loop from perception to action runs faster than the human visual-motor delay that defines how quickly a person can react to the same ball.
What the simulation learned
Reinforcement learning in simulation is not new. The technique underpins game-playing systems like AlphaGo and has been applied to robot manipulation in lab settings for years. What is new is the combination of a purely simulation-based training pipeline, a high-speed physical task, and performance that meets or exceeds human experts under official rules.
The training algorithm, Soft Actor-Critic, is a model-free method — meaning it does not require an explicit model of the environment to learn an effective policy. Instead, it learns by maximizing a reward function that encodes the objective: win the rally, place the ball in a way the opponent cannot return. The physics model used during training was initialized from real gameplay captured from human players, then augmented with domain randomization to expose the policy to variations in ball mass, air resistance, and surface friction. Sony does not disclose whether the full simulation environment or training code has been released publicly.
Peter Dürr, director of Sony AI in Zurich and the project's lead author, told The Guardian that Ace improved continuously as it faced stronger opponents. "We played stronger and stronger players and we beat stronger and stronger players," he said. Michael Spranger, president of Sony AI, framed the result in terms of what it demonstrates about speed as a fundamental problem in robotics: "Speed is really one of the fundamental issues in robotics today, especially in scenarios or environments that are not fixed."
The paper acknowledges the gap between simulation and reality. A ball's spin in the real world depends on surface roughness, humidity, and the rubber properties of both paddle and ball — factors that can vary even between balls of the same model. The logo-tracking method for spin estimation, which Sony uses on both sides of the ball, requires branded balls and may not generalize to competition-grade equipment with unmarked surfaces.
What this means for robotics
The simulation-to-reality transfer, known in the field as Sim2Real, has been a persistent challenge. Policies trained in physics simulators frequently fail when deployed in the physical world because the simulator cannot capture the full complexity of real surfaces, materials, and contact dynamics. A robot that can fold laundry in a simulation may fumble catastrophically with a slightly different towel texture. Getting simulation-trained policies to work reliably in the real world has been the subject of years of research at Boston Dynamics, Google's DeepMind, Figure, and numerous university labs.
If Ace's approach is genuinely transferable — not just to table tennis but to other high-speed, dynamic tasks — the implications for robotics R&D are significant. Physical data collection is slow, expensive, and limited by real-world constraints. A policy that trains in a simulation can run millions of trials in the time it would take a robot arm to accumulate a few thousand real-world attempts. For startups and research groups that cannot afford large fleets of physical robots, a validated Sim2Real pipeline could dramatically lower the cost of developing new capabilities.
John Billingsley, a robotics researcher at the University of Southern Queensland, gave the work cautious credit while noting its resource intensity. "They have gone at the task mob-handed, and used sledgehammer techniques," he told AP News. "True progress comes out of contests." The comment reflects a genuine divide in the field: whether the right measure of a robotics advance is absolute performance or the efficiency and generality of the approach.
The question Sony has not answered
Whether Ace represents a reproducible playbook or a one-off demonstration depends on a question Sony has not yet resolved publicly: will it release the simulation environment, training code, or both?
If the sim is shared, researchers at other institutions can test whether the same approach works for different sports, different manipulators, or different task geometries. That is the scenario where the table tennis result becomes a genuine inflection point for the field. If the sim remains closed, the result stands as an impressive demonstration by a well-resourced team — the kind of thing that wins in Nature and generates press releases, but does not propagate through the research community at the speed the field operates.
The history of similar results in robotics suggests the answer matters more than the result itself. GT Sophy, Sony's earlier superhuman racing AI for the Gran Turismo video game, demonstrated that reinforcement learning could master complex strategy and sportsmanship in simulation. It did not produce a broadly adopted training methodology for game AI development. Whether Ace follows that path or a different one is the question that matters beyond the headline number.
Kinjiro Nakamura, a table tennis player who competed for Japan at the 1992 Barcelona Olympics, watched Ace execute a shot he considered impossible for a human to attempt. "I didn't think it was possible," he told AP News. Since then, he said, he has been working to replicate it. That gap — between what a machine makes thinkable and what a human can execute — may be the most durable thing Ace has produced so far.


