Robots can backflip. They still can't button a shirt. Touch may be why.

Robots can backflip. They still can't button a shirt. Touch may be why. — type0 | type0

PREVIEWRobots can backflip. They still can't button a shirt. Touch may be why. · MD

Robots can backflip, dance in synchronized routines, and pick up eggs without crushing them. They cannot reliably button a shirt, plug in a phone charger, or hand an object from one gripper to another without dropping it. That gap between solved locomotion and broken manipulation is the field's defining open problem, per BBC reporting on dexterous robotic hands. The structural cause is touch.

Vision has been the dominant answer to physical robotics, and it is now hitting three structural walls at the moment of contact. Camera depth perception resolves millimeters, while a fingertip needs sub-millimeter precision to find the edge of a button. The contact side of any grasp is fully occluded from the camera, no matter how many angles you cover. And images carry no force feedback, so a robot "sees" an egg but cannot feel the load on its fingertips. These are physics limits rather than engineering gaps, according to a tactile-sensor entrepreneur who framed the field on a Chinese industry podcast.

Touch, on that account, is not a softer version of vision. It is a different modality. The sensors that get called "visuo-tactile" because they look like small optical mice are actually tactile: an elastic skin deforms on contact, an internal camera reads that deformation, and software converts the image into force and texture. The "vision" label is a misnomer. A human fingertip reference for comparison carries roughly 3,000 sensing points and around 12,000 receptor neurons, a "human-like five-senses" baseline for any sensor design that wants to compete with biology.

Reading the same podcast framework as a field scorecard: motion control sits near 90 out of 100, dexterous hand hardware near 59, and the decision-making "brain" near 30. The point of the numbers is not precision. The point is the shape of the gap. Hardware for locomotion has converged. Hands have not converged on a winning design. And the brain has barely moved, because large language models have "read ten thousand books" but never "walked ten thousand miles" through a tactile world.

This is why 2026 is being talked about as the "year of tactile" (触觉元年), the first year of a new sensing era in robotics. The framing comes from a single guest on a Chinese podcast, not consensus, but it sits on three trends that can be checked independently. First, value consensus: teams training vision-language-action (VLA) models and world models have begun reporting that performance plateaus when manipulation is contact-rich. Second, technology convergence: electromagnetic, piezoresistive, and capacitive sensing routes have collapsed onto the visuo-tactile approach as the engineering winner. Third, cost viability: the guest claims tactile sensors can deliver roughly 100,000x performance gain for only tens-of-percent cost increase, a number worth quoting carefully because it is a vendor pitch.

The strongest counterweight to the inflection-year story is not a competing technology. It is a data problem. Tactile data is closer to physical truth than images or text, smaller in volume than either, and high-frequency, which should make it the easiest modality to learn from. But the field has no open tactile dataset of the size that produced the vision and language revolutions. Training pipelines lean on simulation, where contact physics are still approximate. And no tactile sensor line has reached mass production, so the loop from real-world deployment back into training data is not yet closed. The "year of tactile" framing, in other words, has the engineering ingredients without the data flywheel that made vision and language work.

The case study for this bet is Yimu Technology (一目科技), a Chinese tactile-sensor company. An EET-China show profile and a CNU trade-media feature describe a decade-long push to take tactile sensors out of the lab and onto a manufacturing line, led by founder Li Zhiqiang (李智强). The company is positioned as a specialist in the visuo-tactile approach the broader field is converging on. The reporting here is largely company-adjacent in Chinese-language trade media, useful for product positioning but not independent third-party validation of market share or revenue, which is why the broader inflection-year claim has been hedged.

The bottleneck framing is not a Chinese-only view. BBC reporting on dexterous hands flags manipulation as the field's defining open problem, and an arXiv survey titled "Tactile Robotics: Past and Future" treats touch as a distinct subfield with its own roadmap rather than an extension of computer vision. The guest in the Chinese podcast treats touch as necessary but not sufficient. That distinction matters: hardware, data, and learning algorithms all have to move together, and only one of those three is clearly moving now.

What would falsify the "year of tactile" thesis is concrete. A tactile dataset released at the scale of ImageNet or Common Crawl. A sensor line shipping in volumes rather than engineering samples. A publicly benchmarked manipulation task that a touch-equipped robot solves and a vision-only robot does not. Until at least one of those arrives, the inflection story remains a credible hypothesis about a real bottleneck. It is not yet the scoreboard.

Robots can backflip. They still can't button a shirt. Touch may be why.

Sources