Simon Willison’s LLM rewrite shows where AI toolchains now break
Frontier AI models are starting to break a quiet assumption in developer tooling: that you send in one prompt, get back one block of text, and move on. Simon Willison, the independent developer behind the open-source LLM command-line tool and Python library, just rewrote that contract in public because newer models now emit reasoning text, tool calls, attachments, and other mixed outputs that old prompt-in, text-out wrappers cannot represent cleanly.
The important part is not that Willison shipped another alpha release. It is that LLM 0.32a0 turns one of the better-known independent model wrappers from a prompt-response interface into a message-based one, and then tells plugin authors to rebuild around typed streaming events and provider metadata if they want to keep up with OpenAI, Anthropic, Google, and other fast-changing model APIs.
Willison wrote in his annotated release notes that previous versions of LLM "modeled the world in terms of prompts and responses" and that the old abstraction was "no longer able to represent everything I needed it to." According to his blog post announcing the alpha, the new release makes model inputs a sequence of messages and model responses a stream of differently typed parts. In practice, that means one response can now contain plain text, reasoning output, tool-call requests, tool results, and attachments rather than pretending all of it is just text.
That shift matters because frontier model vendors have been adding exactly those features, unevenly and fast. Willison's post points to reasoning support, image, audio, and video attachments, structured JSON output, and tool execution as the pressure behind the rewrite. His earlier April 5 research note made the problem explicit: vendors had added capabilities that his abstraction layer could not handle, including server-side tool execution.
The release notes show how deep the migration goes. According to the GitHub release, prompt inputs and outputs are now lists of Message objects made of typed Part objects such as text, reasoning, tool calls, tool results, and attachments. The same release adds response.stream_events() methods for typed event streams, a response.reply() method that can continue a conversation and automatically execute pending tool calls, and JSON-safe serialization for full conversation turns including provider-specific reasoning metadata.
This is where the burden moves beyond one tool. The expanded advanced plugin documentation now tells model-plugin authors they need to handle tools, attachments, token usage, and async execution. The 0.32a0 release notes go further, explicitly telling plugin authors to support StreamEvent, consume prompt.messages, and round-trip opaque provider metadata such as Anthropic reasoning signatures and Google's thoughtSignature values. Thin wrappers that only normalize text in and text out are becoming the brittle part of the stack.
Willison also exposed some of the next problems rather than pretending they are solved. He wrote that the release is still alpha and said one large task remains: redesigning the SQLite logging system so it can capture these more detailed conversation graphs without duplication. The changelog already shows a quick follow-on patch, version 0.32a1 on April 29, fixing a bug where tool-calling conversations were not correctly restored from SQLite. That does not undercut the architectural shift, but it does show how messy the persistence layer gets once model output stops being a single transcript.
The skeptical read is straightforward. LLM is still one respected independent project, not the center of the tooling market, and alpha releases do not prove the wider ecosystem will follow. Big vendors can still pull developers back toward their own SDKs by shipping new features faster than shared abstractions can absorb them. But Willison has made the migration pressure unusually legible: if the model stream now contains thinking, tool use, attachments, and provider-specific metadata, the wrapper that flattens all of that into text is no longer simplifying the system. It is hiding the part developers increasingly need.
What to watch next is whether other cross-provider tools adopt the same message-and-events model, or whether the field snaps back toward vendor-specific clients. If more plugin authors decide they have to preserve reasoning traces, tool calls, and resumable conversations intact, the real competitive layer in AI tooling may shift away from prompt convenience and toward who can carry the full shape of the conversation without losing information.