For decades, the bottleneck in factory and household automation has not been a robot's ability to reach an object. It has been what happens the instant its fingers close around it. Cable connectors buckle. Garments fold unpredictably. Bread bags reseal themselves. The robot sees the target, plans the trajectory, executes the move, and then loses the plot, because the part of manipulation that actually matters, the physics of contact, has been treated as a footnote to a dexterity problem that nobody could quite solve.
A thesis gaining ground at research labs and trade shows argues that the field has had the diagnostic wrong. Dexterity, in this reading, is the wrong axis to optimize. The right axis is contact: the moment-by-moment sensing and control of force, friction, deformation, and geometry as a robot's end effector meets a real object. Frame it that way and a generation of stubborn automation problems stops looking like an arm-speed problem and starts looking like a perception problem, the kind that a third modality in physical AI, alongside vision and language, could plausibly crack.
The clearest public expression of this framing came out of ICRA 2026 in Vienna, where IEEE Spectrum ran a sponsored feature on AGILINK, a Chinese manipulation hardware company that has begun calling its stack "contact intelligence" rather than the older "motion intelligence" label the dexterity camp preferred. The piece, branded as a vendor-sponsored placement, walked readers through a balloon-dog demonstration the company ran on the show floor: a hand programmed using demonstrations from professional balloon artists plus targeted human intervention data, then folded into reinforcement learning. The task was deliberately chosen. Balloons are deformable, the contact regime shifts every twist, and a single over-force tears the workpiece. It is exactly the class of manipulation that classical position-controlled arms and even recent vision-language-action models tend to fail on.
The hardware that ran the demo is AGILINK's OmniHand 3 Ultra-M, a 20-degree-of-freedom direct-drive hand weighing roughly 630 grams and built at human scale. Every fingertip carries a tactile sensor, and the palm adds around 300 more 3D tactile points. The vendor's datasheet lists force resolution near 0.005 newtons, spatial resolution around 0.04 millimeters, and roughly 50,000 sensing points per square centimeter, with a 3 kilogram stable grasp, 8 kilogram lift capacity, and ±0.2 millimeter repeatability. On paper, that is two to three orders of magnitude finer than the contact sensing most off-the-shelf grippers ship with, and the architecture (CAN-FD, RS485, Ethernet, ROS 2, and Isaac/MuJoCo simulation hooks) is built to feed dense contact data into a learning loop rather than to a low-rate reflex controller.
The specs are vendor figures, and a careful reader should hold them that way. The same article carries AGILINK's homepage claim of "No. 1 in dexterous hand market share in China (Q1 2026)" and a count of 8,000-plus hands shipped and 50,000-plus real-world data hours, none of it independently audited. There is no third-party benchmark, no peer-reviewed paper on contact intelligence, and no quoted researcher outside the company in the Spectrum piece. Treat the contact-versus-motion dichotomy as AGILINK's marketing frame for now, and treat the Ultra-M numbers as a vendor's best case for what direct-drive tactile hardware can do, not as an industry consensus.
That said, the framing itself is not purely a marketing invention. Anyone who has tried to automate a wire-harness shop, a garment warehouse, or an assembly line with even modest deformable-parts handling has run into the contact wall. A robot can land its gripper on a connector, and the moment the plastic housing deflects by a fraction of a millimeter, the position-based plan is no longer valid. Force sensing helps but only tells you a single scalar at the wrist. Distributed tactile sensing, where thousands of points on the fingertip register pressure and shear in real time, is closer to what a human hand does, and the question is whether feeding that data into the same kind of learning stack that absorbed vision and language a decade ago can do for contact what convolutional networks did for image recognition.
There are honest reasons that has not happened yet. Sim-to-real for contact is genuinely hard. Contact dynamics are stiff, non-smooth, and resistant to the differentiable simulators that have made manipulation research tractable. Tactile sensors remain expensive, brittle, and short-lived compared to cameras, with calibration drift that would be unacceptable in a vision pipeline. And the dexterity benchmarks the field has standardized on, from block stacking to in-hand reorientation to dexterous tool use, tend to be designed so that contact failures show up as task failures the model can learn from, which is exactly the regime where most research robots still operate. A benchmark that rewards "noticed the bag was about to slip and rolled the grip to recover" does not exist in the way that ImageNet rewarded visual classification.
What to watch over the next 12 to 24 months is whether the contact thesis moves from sponsored features and trade-show demos into the kind of artifact a serious lab can build on. Three signals would matter. First, an independent benchmark that scores manipulation policies on contact-rich tasks the way ImageNet scored vision, with public leaderboards and reproducible code. Second, peer-reviewed evidence that dense tactile data improves policy performance on the dexterity tasks the field already uses, not just on demos designed to showcase tactile hardware. Third, a tactile sensor with a price, lifetime, and calibration story that survives a year on a factory floor. If those arrive, the contact era stops being a vendor thesis and becomes a research program. If they do not, the field will keep calling the problem dexterity and keep losing the plot at the moment of contact.