The latest version of OpenSquilla, an open-source AI coding assistant, doesn't let an agent finish a job until the agent has proved the code works. The 0.4.0 release wraps a new "coding mode" around a red-green-regression chain that runs a test the agent writes, makes the agent's edit pass that test, and then re-runs the project's existing tests to confirm nothing broke. Only then does the change land in source files.
That mechanic is the real news. It shifts who owns the call on whether AI-written code is "done." Today, when a model rewrites a function or fixes a bug, a human reviewer makes that judgment. Under OpenSquilla 0.4.0's new pipeline, that decision is encoded: edits live in an isolated sandbox until the agent's own tests come back green and the wider test suite still passes; if any step fails, the agent keeps iterating inside an auto-repair loop until the cap is hit (OpenSquilla 0.4.0 Release Notes, CLI documentation).
The company frames this as "the first time AI coding can self-verify." That framing needs splitting. Loop-style test feedback is not novel. Adjacent projects, including SWE-agent and Aider, both run generated code against test suites as part of their workflows. What OpenSquilla 0.4.0 does is package red-green-regression as the default gate for every code change, not an opt-in flag. That packaging matters because the verification cost now sits on the runtime, not the operator (qbitai announcement).
The demo case the team is leading with is a feature add to Andrej Karpathy's micrograd autodiff library: OpenSquilla's coding mode wrote the change and was cross-checked against PyTorch on the same problem, reporting matching values to 10 decimal places on both the forward pass and every gradient. Worth noting: that match is the project's own demonstration, not an independent replication, and should be read as a showcase rather than a benchmark (qbitai announcement).
The self-verification chain sits on top of work the team has been doing on agent intelligence per unit cost. OpenSquilla's broader pitch is what it calls a "Learnable Harness": local routing that picks models by task complexity, loads skills and memory on demand, and preprocesses tool results before invoking the model (OpenSquilla official site). On routing specifically, the announcement cites an earlier 硅星人 report claiming OpenSquilla's smart routing beats OpenRouter by roughly 4.4 percentage points in precision and runs about 75% cheaper on comparable tasks, with output quality "basically on par" with flagship models. Those figures come via the qbitai write-up and were not independently re-verified. Treat them as an attributed claim pending a primary source (qbitai announcement).
The release also ships OpenSquilla's first signed and notarized desktop installer for macOS and Windows, so non-developers can install the agent with a double-click instead of a CLI (OpenSquilla CHANGELOG, OpenSquilla 0.4.0 Release Notes). Adoption signals so far are limited to GitHub: per the announcement, the project's star count reached the low thousands within weeks of its earlier public launch. Independent figures on paying users, enterprise traction, or monthly active developers aren't in the available material (qbitai announcement).
What to watch next is whether the "code must prove itself before shipping" pattern becomes the default across other open-source coding agents, and whether OpenSquilla publishes a version of the micrograd-style demonstration under independent evaluation. Until then, 0.4.0 is best read as a packaging bet: the agent runtime now carries the verification cost the human reviewer used to.