OpenAI has shipped a new voice model, GPT-Realtime-2, that the company is marketing as the first of its kind with GPT-5-class reasoning. The only way to reach it, as of June 12, 2026, is through OpenAI's realtime audio API, a WebRTC-based endpoint that streams speech to and from the model in a browser. The ChatGPT iPhone app does not include it, based on Willison's testing as of the post date. (Simon Willison's Weblog)
What a voice model with GPT-5-class reasoning would actually buy you, if the marketing claim holds, is the kind of multi-step thinking that text-based GPT-5 handles routinely and that earlier voice models have stumbled on. A voice model that could reason would be one you could ask to compare two contracts, walk through a debugging log, or summarize a long PDF, and have it follow the thread out loud. To show what that looks like in practice, independent developer Simon Willison updated the WebRTC audio playground he first built in December 2024 so it points at GPT-Realtime-2, and added a text box where a user can paste a large chunk of document context before starting the audio session. The result is a spoken conversation grounded in text the user supplies, not in anything the model retrieved. (Simon Willison's Weblog)
The limits are worth naming. According to the API surface Willison tested, GPT-Realtime-2's training data cuts off on September 30, 2024, which means anything that has happened in the twenty months since is, for this model, simply unknown. The "talk to your documents" experience is paste-and-talk, not retrieval: there is no index, no chunking, no citations back to source pages, only the text the user hands the model in a single blob. And the API-versus-consumer gap is a recurring OpenAI pattern. A capability ships to developers first, sits there for months, and either trickles into ChatGPT later or never does. Whether GPT-Realtime-2 lands in the ChatGPT iPhone app, and how much of its claimed reasoning survives the trip from API to consumer product, are the questions worth watching. (Simon Willison's Weblog)