The real story in ModelBrew's continual learning benchmarks is the architecture. The startup freezes a base large language model and stacks small, per-domain adapters on top of it. Each adapter holds one skill. The base never gets touched. That design choice, not a clever training loop, is why the company's reported forgetting rate sits near zero across five sequential fine-tuning passes — while the live continual learning pitch shown in the Reddit announcement remains a private beta on a separate timeline.
The trick has a name in the field: catastrophic forgetting is what happens when a neural network gets retrained on new data and partially overwrites what it already knew. Continual learning is the broader effort to teach models new skills sequentially without erasing old ones. Most existing remedies, including regularization, replay buffers, and architectural growth, are partial, a point the company's own background paper on the failure modes lays out. ModelBrew's CRMA engine, which the company says is patent pending, sidesteps the problem by not retraining the base at all.
In their published benchmark on Mistral-7B, the company reports −0.17% backbone drift across five sequential domain adapters, averaged over three seeds (±0.17). A naive LoRA baseline running the same chain forgets 43.0% of the original capability, where LoRA is a popular parameter-efficient fine-tuning method that adds small trainable matrices to a frozen model. On a separate inference test on Gemma-2-9B, the company's product page reports 98 out of 100 held-out questions answered correctly when CRMA is toggled on, versus 38 out of 100 with CRMA off, with the company-reported 95% confidence intervals not overlapping.
Those numbers are striking, but two caveats matter. First, all results are first-author evaluated. The company's research roadmap lists a blinded two-rater audit as a planned next step, but it has not been published. Second, the "Live Continual Learning" capability implied by the original Reddit r/MachineLearning post is, per the company's own product page, a "private beta soon" feature, not the training-time CRMA numbers being shown. The shipped benchmarks measure how well a stack of adapters preserves the underlying model after sequential training. They do not yet measure single-pass, no-retraining updates to a deployed adapter.
That gap matters because the marketing pitch and the published results describe two different things. The architecture is real. The numbers are company-reported. The live-update claim is on a separate timeline.
Why the frozen-base design holds up
The reason the architecture works is mechanical. A standard fine-tune updates the weights of the model itself, so a second fine-tune on different data competes for the same parameters. Adapter-based fine-tuning, by contrast, injects a small set of new weights, typically low-rank matrices, into each layer and trains only those. The base model is frozen, so its original capability is preserved by construction. ModelBrew goes one step further: each customer trains a self-contained adapter, the adapter sits next to a frozen base, and switching adapters routes a query to the right skill.
The company's founder's HuggingFace post describes a different five-domain chain (Medical, Enterprise, Finance, Military, Real Estate) on Mistral-7B, with the post reporting 26 out of 31 (84%) correct across phases, 74% accuracy on the Medical domain across all five phases, and 31 out of 31 routing decisions correct. The fifth-domain loss was reported at 0.0098. That domain list does not match the one on the company's product page, which lists Medical, legal, finance, code, and general. The discrepancy is small but worth noting for any reader trying to reconcile the two.
"Exact unlearning" as a side effect
Because each customer's adapter is self-contained, deleting it removes that customer's training signal without touching anyone else's data or the base model. The company markets this as "exact unlearning" and ties it to California's CCPA privacy deletion rules and state Assembly Bill 1008. They also issue sha256 erasure certificates and maintain a hash-chained audit trail.
This is a meaningful architectural claim, not a benchmark. It depends on the adapter-isolation design holding in production, on the deletion path actually removing the right files, and on no shared weights or hidden state carrying residual signal between adapters. Each of those is plausible given the design but has not been independently verified.
What it costs and who can run it
The product is priced at $3.99 per million tokens, with a free tier limited to three runs per day on the smaller TinyLlama-1.1B model. The paid tier supports LoRA and QLoRA fine-tuning on Mistral, Llama-3.1, Saul, Qwen3, and Gemma-2 in the 7B to 9B range, with per-domain adapters trained on a frozen base.
What would change the picture
Three things would move this story from promising architecture with vendor-reported numbers to something more solid. First, the blinded two-rater audit that the company says is on the roadmap needs to land, ideally with the rater identities and the test set published alongside the numbers. Second, the live continual learning module needs its own benchmarks on the single-pass, no-retraining scenario that the Reddit announcement implies, not the training-time numbers from the same product page. Third, the domain list discrepancy between the marketing site and the founder's HuggingFace post needs an official reconciliation.
Until then, the architecture explains the headline numbers, the headline numbers remain company-evaluated, and the live-update claim stays a forward-looking statement.