The Sync-Async Gap Has a Four-Worker Fix

PREVIEWThe Sync-Async Gap Has a Four-Worker Fix · MD

Simon Willison released llm-all-models-async 0.1 on March 31, a plugin that wraps sync-only language model plugins and exposes them as async-compatible. The mechanism is a ThreadPoolExecutor with four workers, bridging a sync plugin to async code that expects a non-blocking interface. It's a thin shim, and that's the point.

The plugin exists because Datasette, Willison's data exploration tool, only supports async LLM models, while his llm-mrchatterbox plugin runs Mr. Chatterbox, a 340 million parameter language model trained on 28,035 Victorian-era British Library books from 1837 to 1899. Mr. Chatterbox is sync-only. Datasette's LLM enrichment features need async. The gap was real.

Getting there required a core change. The LLM 0.30 changelog, also released March 31, adds an optional model_aliases parameter to the register_models() plugin hook. This lets a plugin inspect what other plugins have already registered and make different decisions based on that information. It's a coordination primitive for a plugin ecosystem that has grown large enough to have genuine integration conflicts.

"The register_models() plugin hook now takes an optional model_aliases parameter listing all of the models, async models and aliases that have been registered so far by other plugins," the changelog reads. "A plugin with @hookimpl(trylast=True) can use this to take previously registered models into account."

That sounds mundane. It is, in the best way. The LLM plugin ecosystem has reached the stage where the core needs to expose information about plugin registration order so plugins can adapt to each other. That's a dependency graph becoming visible at the infrastructure level.

Willison wrote the plugin with help from Claude Code, which he noted was the first time he'd had an AI build a full LLM model plugin from scratch. The mrchatterbox plugin itself took a circuitous path: Trip Venturella trained the model using Andrej Karpathy's nanochat, filtered the British Library corpus to Victorian-era texts with an OCR confidence threshold of 0.65, and used GPT-4o-mini and Claude Haiku to generate synthetic conversation pairs for the supervised fine-tuning step. The model weights are 2.05GB on disk.

The async shim is the least interesting technical part of this story, but the most interesting ecosystem signal. Plugin architectures accumulate integration tax as they mature. At some point the core has to start mediating between plugins that weren't designed to coexist. LLM 0.30 is that inflection point: the hook that lets one plugin know what another plugin has already done.

The Sync-Async Gap Has a Four-Worker Fix — type0 | type0

The Sync-Async Gap Has a Four-Worker Fix

Sources