Voice Is the Default Now. Who Gets Priced Out?
Apple and Google are building AI assistants that want to be spoken to, not typed. That design choice carries a quiet cost for people who can't or won't talk to a phone in public.
Apple and Google are building AI assistants that want to be spoken to, not typed. That design choice carries a quiet cost for people who can't or won't talk to a phone in public.
On stage at Apple's WWDC 2026 in June, company presenters did something that would have looked strange a decade ago: they spoke to their iPhones to show off a new, more conversational Siri. The same week, at Google I/O 2026, the company's Gemini assistant was shown off in the same mode: voice in, voice out, no keyboard in sight. In a CNET commentary on the trend, senior writer Jeff Carlson argues the industry is making that bet without counting the people it leaves behind.
The demos shared a design choice: that the next generation of consumer AI will be heard, not read. Voice-first interfaces are sold as the natural next step in computing, and for many users they are. For others, they are a tax, paid in stuttered phrases on a quiet commuter train, in a signed conversation an assistant cannot parse, in an open-plan office where speaking to a phone means performing for the whole floor.
Who pays the cost. The cost is uneven. People who stutter may find that an assistant which demands fluent speech demands fluency they do not have on demand. Deaf and hard-of-hearing users in shared spaces, and anyone who lip-reads or signs, are working in a modality the assistant was not designed for. So are people in open offices, libraries, or shared housing, where speaking to a phone means announcing a calendar event to strangers. Neurodivergent users may find the social contract of "talking to your phone" a small performance they did not sign up for. Second-language speakers carry the tax of accent, of hesitation, of words the assistant may not recognize. So do people who simply prefer not to narrate their lives to a microphone, and Carlson is right that this last group is real, not a fringe case.
This is not a failure of speech recognition. It is a design choice. Voice is being elevated from one input among many to the default way to use a product. When the default is voice, every other input — keyboard, gesture, glance, switch control — becomes the exception that has to be remembered, re-enabled, or reverse-engineered. The cost of that elevation is borne by the people who already had the most to lose from being forced to perform a conversation for a machine.
What already works. None of this requires a new invention. The building blocks for a multimodal default already exist. iOS and Android ship with full keyboard and dictation paths that still function. Accessibility settings for switch control, screen readers, and text-based assistant invocation are mature. Third-party keyboards, intent APIs, and on-device language models can route a request through text even when the assistant's flagship mode is voice. The question is not whether the technology exists. It is whether product teams treat those paths as first-class or as legacy, and whether reviewers and buyers notice the difference.
The choice that has to be made. Carlson's piece is constructive rather than scolding, and the constructive point is that this is a choice, not an inevitability. There are policy levers that could make the choice explicit. The EU AI Act already carries accessibility language, and how strictly that language is read against voice-mandatory defaults will matter. Industry standards for on-device intent APIs, third-party keyboard support, and accessibility-first assistant invocation are not glamorous, and they are exactly the kind of plumbing that determines whether a person who cannot speak to a phone in public is a first-class user of the AI transition or a footnote to it.
What to watch. Whether the new Siri and Gemini ship with keyboard and text-first entry points as defaults or as hidden settings. Whether smart-glasses hardware, a surface that has no keyboard, makes voice the only input that scales. Whether third-party assistants keep text as a peer to speech. And whether accessibility researchers are in the room when the next default is set, or only in the press release after it ships.
The keynote stage is not where most people use a phone. It is in a quiet train, a shared kitchen, a packed subway car, a hospital waiting room, a library. The question is not whether voice is a powerful input. It is whether everyone gets to keep the others, and whether the industry's answer to that question is treated as a product decision or a civil-rights one.