Can a Local Model Actually Run Your Database? Simon Willison Just Built the Test Case
Simon Willison spent three years building a Python library for running LLMs. On May 21, 2026, he announced Datasette Agent, a new open-source plugin for the Datasette data platform that lets you have a conversation with your SQLite databases. Ask a question in plain English; the model writes the SQL, runs it, and returns an answer. The live demo on agent.datasette.io runs Gemini 3.1 Flash-Lite. But Willison also documented a one-liner that swaps in gemma-4-26b-a4b, a Google model he runs locally on a Mac through LM Studio, an app for serving local AI models. No API call. No bill at the end of the month.
That second detail is the thing worth testing.
Datasette Agent is not the first AI tool that touches a database. What makes it interesting is the combination: a working plugin architecture, a credible developer behind it, and the explicit claim that open-weight models can now handle structured-data tool use reliably enough to ship. Willison has been building LLM tooling since 2023. He does not usually announce things that do not work.
The tension shows up in his own research. In a post on the same day as the Datasette Agent launch, Willison noted that the LLM 0.32 refactor was directly informed by building the agent. The two projects are entangled. He is not cherry-picking a success story: the architecture of his model-abstraction library and the architecture of his agent are the same living experiment.
And yet a thread on the LocalLLaMA subreddit, posted this week by a developer running Gemma 4 26B-A4B via Ollama on a personal AI platform backed by SQLite, documented a failure that looks similar in structure to what Datasette Agent is trying to do. The model was tasked with a code audit using structured tool use. It fabricated the entire audit. Not a partial hallucination, not a minor error. The whole thing.
The task in Datasette Agent is SQL query generation, which may have a lower hallucination surface than a code audit. A SQL query either runs or it does not. A code audit requires semantic reasoning about whether the code is correct, a task with no ground truth to appeal to. But the failure mode is similar enough to be worth taking seriously.
Datasette Agent ships with three plugins at launch. One generates charts using Observable Plot, a JavaScript visualization library. One taps OpenAI for image generation. One runs code in a Fly Sprites sandbox. Willison noted that Claude Code and OpenAI Codex, the AI coding assistants, have been useful for writing new plugins. Point them at the repo, tell them what you want, and they write it. That claim is verifiable in a way that the model reliability claims are not: you can go look at the plugin source code and see whether the commits credit a model.
The 81 commits on the datasette-agent repository tell a different story than a press release. This is a project with history, with iterated experiments, with Willison as the primary author and occasional collaborators. It is alpha software. But it is not vaporware.
The core empirical question that nobody has answered yet is whether local models handle SQL generation in agent loops reliably enough for production use. The demo works. The fabrication case is not the same task. The gap between the two is where the actual answer lives.
What Willison has built is a test case, not a verdict. Datasette Agent gives you a reproducible experiment: run the same queries across Gemini Flash-Lite and gemma-4-26b via LM Studio, score them on syntactic validity and semantic correctness, and you know where the line actually breaks. That is a useful thing to have. The field has a lot of announcements about local AI agents. This one comes with instructions.