The Maintainer of llama.cpp Uses a Local 27B Model for Daily Coding. He Says the Bottleneck Is Still Pull Request Review

PREVIEWThe Maintainer of llama.cpp Uses a Local 27B Model for Daily Coding. He Says the Bottleneck Is Still Pull Request Review · MD

The person who built the open-source toolchain that lets local AI models run on a regular laptop has spent the last six weeks running one of those models for his own coding work. The bottleneck, he says, is not generating code. It is reading other people's.

Georgi Gerganov, the original creator and current maintainer of llama.cpp, a widely used C/C++ library that lets open-weight AI models run on consumer hardware, described his daily setup in a Hacker News comment on Vicki Boykis's recent essay on local model tooling. The model is Qwen3.6-27B, a large coding model from the Qwen team. Gerganov called it "a very capable local model for coding tasks" and said he has been using it almost daily for "the last month and a half."

The hardware is ordinary. He runs Qwen3.6-27B on an Apple M2 Ultra desktop and on a separate box with an Nvidia RTX 5090 graphics card, machines a working developer might already own. The harness around the model is deliberately minimal: a stripped-down version of the "pi" coding agent, launched with the flags pi -nc --offline, paired with a short system prompt that nudges the model's output toward his own style.

The Boykis essay that prompted Gerganov's comment argues that running models locally has finally become practical for ordinary work, partly because the surrounding tooling has matured. Gerganov's setup is a working example of that thesis, with a caveat Boykis did not foreground. The work he hands to the model is "small mundane tasks at ggml-org," the GitHub organization that hosts llama.cpp and related projects. He does not describe the output as a showcase. "Nothing really impressive," he wrote, in the comment collected by Simon Willison's roundup of the Gerganov quote.

The next line of the comment is the part that matters. Asked whether he would lean on the model more, Gerganov pointed at the part of his job that has not changed. "I'd use it more if PR review didn't consume most of my time," he wrote. The inference, drawn from the quote itself, is that the local model can carry a slice of the writing side of a maintainer's day. The reading side, where someone else wrote the patch and you have to judge it, is still a human job.

The signal here is narrower than the usual local-vs-hosted contest. It is not about whether a 27-billion-parameter open-weight model can match a frontier hosted model on a benchmark. It is about whether the local model can carry the part of a maintainer's day that the maintainer is happy to stop doing. Gerganov's answer is yes, inside a narrow lane, and he is the source on it. The pull request queue is the lane he is not yet willing to hand over.

Gerganov is not an outside observer. He maintains the library that makes this workflow possible, which gives the quote a different weight than a generic developer endorsement. It also limits how far the signal travels. He is describing his own workload, on his own projects, with a specific model and a specific harness. The picture he draws is small but useful: the local-coding floor has reached the point where one of the people who built the floor runs it for real work, and the rest of the day's bottlenecks are still the ones the floor was never going to fix.

The Maintainer of llama.cpp Uses a Local 27B Model for Daily Coding. He Says the Bottleneck Is Still Pull Request Review — type0 | type0

The Maintainer of llama.cpp Uses a Local 27B Model for Daily Coding. He Says the Bottleneck Is Still Pull Request Review

Sources