Picture a home robot rolling from the kitchen into the hallway. From a person's perspective, the threshold is obvious. From the robot's perspective, it is a live argument about where one room ends and another begins. Run two of today's best indoor mapping systems across the same apartment and they can return five rooms or eight, depending on which one is doing the counting.
That disagreement is the engine of a new preprint from researchers working on hierarchical 3D scene graphs for indoor robots, Occupancy-Grounded Room Segmentation for Hierarchical 3D Scene Graphs. The paper does not just propose a new way to count rooms. It argues, with evidence, that the field has been measuring the wrong thing.
Indoor robots organize what they see into layers: small things (cups, chairs), mid-sized things (rooms), and the building as a whole. The room layer is the connective tissue that lets a robot reason about "go back to the kitchen" rather than "go back to the polygon I labeled 7." Different research groups build that layer from different spatial substrates: place clusters, wall planes, segmentation outputs. Because no two systems share a geometric yardstick, two groups can map the same space and disagree about how many rooms it contains, or where the boundaries fall.
The proposed fix is concrete. Instead of inferring a room from walls or from where the robot paused, the new pipeline anchors each room to a tracked region of free space. The robot sweeps the apartment, the system records which empty regions connect to which, and each room gets a polygonal footprint that can be checked against an annotated ground truth. The framing, translated: pin a room to the empty space it owns, then draw a polygon around that region and measure whether the polygon matches the labeled room.
That geometric criterion is what makes the paper a methodological contribution rather than a benchmark win. The same yardstick can be applied to any future pipeline, including the dominant one, Hydra, which builds rooms from place connectivity.
The team evaluated the new pipeline on 12 scenes from Matterport3D, a standard indoor dataset, and compared it against Hydra. The result, in plain language: the occupancy-grounded pipeline recovers substantially more room instances than the place-connectivity baseline. The cost is precision. The new method draws some of its polygons sloppily, and wall-accurate boundaries remain an open problem for every method in the comparison, including the new one. The paper earns trust by naming that limit.
The 12-scene evaluation is a real benchmark, but it is not "all indoor robotics." Generalization claims beyond Matterport3D would need more scenes, more building types, and more cluttered environments. Anyone treating this as a deployment-ready result should hold that line.
For practitioners, the practical artifact is the released code. The pipeline is open source, the geometric criterion is documented, and other groups can now run the same test on their own room-segmentation methods. The paper supplies the falsifiable test the field lacked, and the authors flag what it does not yet solve.
The next paper to watch is the one that fixes the wall-boundary problem without giving up the recall. The yardstick is in place. The next measurement is overdue.