Google built a robotics model that reads pressure gauges. It also published numbers that defense and industrial buyers will care about more: the robot is 6 percent better at text-based safety reasoning and 10 percent better at identifying hazards in video than the previous version.
Those numbers — disclosed quietly in a Google DeepMind blog post on April 14 alongside the announcement of Gemini Robotics-ER 1.6 — are not the kind of metric that makes headlines in the AI world. But they are the answer to the question procurement offices actually ask before signing contracts for robots that work near humans: is this thing safe enough to let near my people? The capability numbers — 86 percent accuracy on instrument reading, 93 percent with a feature called agentic vision — are secondary. The safety data is the pitch.
Gemini Robotics-ER 1.6 is Google's second generation of embodied reasoning software: a high-level reasoning layer, not pre-programmed instructions. It can read analog gauges, sight glasses, and complex instruments; plan multi-step tasks across different camera views; and detect when it has successfully completed something. On instrument reading, it scores 86 percent accuracy, up from 23 percent with the previous version. With agentic vision — which chains visual reasoning with on-the-fly code execution — it reaches 93 percent. The Boston Dynamics partnership is the commercial reference case. Spot, the quadruped robot deployed in over 1,500 facilities worldwide, is the flagship integration for instrument reading. Marco da Silva, vice president and general manager of Spot at Boston Dynamics, said in a joint statement that instrument reading and improved task reasoning will enable Spot to see, understand, and react to real-world challenges completely autonomously. Facility inspection is the application; the partnership is the proof point.
But the safety numbers buried in the same post tell a different story than the capability metrics. Google explicitly framed Gemini Robotics-ER 1.6 as its safest robotics model to date, citing superior compliance with Gemini safety policies on adversarial spatial reasoning tasks and improved adherence to physical safety constraints — things like not picking up objects heavier than 20 kilograms or handling liquids. The specific framing — adherence to safety constraints, injury risk identification on real-world injury reports — is not benchmark language. It is procurement language, as IEEE Spectrum noted.
The honest caveats are real: the gains are modest, 6 and 10 percentage points, and measured on ASIMOV v2, a benchmark Google helped develop. The model is vision-only and cannot use touch or force feedback to confirm a grasp. And real-world deployment data beyond the Boston Dynamics partnership is not public.
But the strategic signal is clear. When a company with Google's resources decides to publish safety benchmarks alongside capability benchmarks, it is not doing it for the research community. It is talking to the procurement office. The robotics industry has spent years trying to prove its robots work. Google is now trying to prove its robot is safe enough to work next to you.