In January, a Waymo robotaxi struck a child near a school in Santa Monica. The vehicle decelerated from 17 mph to 6 mph before contact, and the actual impact was 6 mph. Within days, Waymo leaned on its own human-driver benchmark to publicly argue that an attentive human driver would have hit the child at roughly 14 mph. On Wednesday, Waymo published a replacement for that very model, co-developed with TU Delft and described in a paper in Nature Communications, according to TechCrunch reporter Sean O'Kane.
Waymo is calling the new model the Reference Driver, and it is built on a framework called active inference. In TechCrunch's summary, the framework treats a driver as an agent that constantly imagines possible futures and picks the action that leads to the safest, most predictable outcome. Waymo says the new model is more accurate than the human-driver benchmark it has used for several years. The functional shift, as Waymo frames it, is from grading human behavior at or near the moment of impact to reproducing a careful, competent human's behavior across the entire run-up to a crash.
That distinction is the load-bearing one. The 14 mph figure Waymo cited in January came from a model that, by Waymo's own description this week, no longer represents the company's best view of how humans drive. The old benchmark graded an encounter at its terminal point. The Reference Driver is designed to simulate a competent human's choices through the seconds that preceded contact, including the recognition, deceleration, and steering that a careful driver would have produced.
Waymo's framing of the upgrade, as reported by TechCrunch, leans on a comparison to physical and virtual crash dummies, the tools the auto industry has used for decades to evaluate the safety performance of cars. A dummy does not drive the car; it sits in for the human body so engineers can measure what would happen. The Reference Driver is positioned as the behavioral equivalent: a stand-in for a careful human, used to grade the actions of an autonomous system across the full sequence of an encounter.
The fact that Waymo is the entity that built, validated, and will use this benchmark matters. There is no independent third-party scorecard in autonomous vehicle safety today, and Waymo is one of the most aggressive public voices in arguing that its robotaxis are safer than human drivers. The Reference Driver is, in the first instance, a tool Waymo will use to grade itself. Peer review in Nature Communications is a check on the modeling, not a check on the conclusions Waymo draws from it. We have not read the paper in full; the characterization above relies on TechCrunch's summary of Waymo's claims, not on the paper's contents.
The timing of the upgrade is the other reason it is news. NHTSA and NTSB investigations of the Santa Monica incident remain open, according to TechCrunch. Waymo is simultaneously expanding into more cities, where it operates a commercial robotaxi service owned by Alphabet. The next safety claim the company makes, in any new market or in response to any new incident, will be measured against a benchmark the company itself rebuilt in the middle of an active federal investigation into a real-world crash.
Waymo's announcement is also a quiet concession. The company is not arguing that its prior model was wrong, exactly; it is arguing that the new model is more accurate than a benchmark it had previously called accurate enough to defend its public statements. Replacing your own yardstick mid-investigation is not a routine release. It is a sign that the prior yardstick was, at minimum, contestable in ways the company now wants to foreclose.
There is a constructive reading of the move, and it is worth naming. A better, externally peer-reviewed benchmark for human behavior raises the floor for everyone, including Waymo's critics, because it makes the company's future safety claims more auditable. If the Reference Driver is genuinely a more faithful model of a careful human driver, then the next time a Waymo robotaxi strikes a child or anyone else, the public will be able to evaluate the company's interpretation of "what a human would have done" against a model developed with academic collaborators and published in a peer-reviewed venue. That is a more useful artifact than the black box Waymo used in January.
The constructive reading has a limit. The Reference Driver is still a model. It still produces numbers. Those numbers will still be selected, summarized, and presented to the public by Waymo, in language favorable to the company's case unless someone else does the work of running independent scenarios. The paper's publication creates the conditions for that work, but it does not do it. The company that built the model is also the company that will interpret what the model says.
Three things to watch. First, the Nature Communications paper itself: the active-inference formulation, the datasets used to fit human behavior, the explicit comparison to the model Waymo used in January, and the limits the authors place on the model's validity outside the scenarios it was trained on. Those details are not in TechCrunch's summary, and the draft does not lean on them. Second, the NHTSA and NTSB investigations of the Santa Monica incident, which will eventually have to make a finding about Waymo's December and January public statements, including the 14 mph figure. Third, the next time Waymo uses the Reference Driver in public: what scenario it picks, what number it produces, and whether it publishes the inputs to the model so independent researchers can rerun the comparison.
For now, the headline is not that Waymo's robotaxis are safer or less safe than human drivers. The headline is that the model Waymo has used to claim they are safer just got rebuilt, in the open, mid-investigation, and that the next round of safety arguments will be measured against the new one.