The headline-grabbing claim from ServiceTitan's recent QCon AI talk was that legacy code migration timelines had collapsed from years to weeks. The load-bearing engineering decision underneath that claim had nothing to do with the model. It was an assembly-line decomposition pattern wrapped in deterministic validation loops, and the talk's actual lesson for other engineering teams is that the loops are the product.
David Stein, Principal AI Engineer at ServiceTitan, presented the pattern in his QCon AI session "Moving Mountains: Migrating Legacy Code in Weeks instead of Years". The pattern, in his framing, treats a legacy refactor the way a factory treats a complex assembly: break the work into many small standardized tasks, run them in parallel across many AI agents, and gate every step with programmatic checks. The agents do not get to decide what success looks like. The validation infrastructure does.
The decomposition step is what makes the parallelism possible. A monolithic "rewrite the billing module" prompt does not parallelize, and a probabilistic model given that prompt will hallucinate in directions no reviewer can catch. Stein's pattern pushes the work toward smaller, well-bounded operations: rename an identifier across a known surface, migrate a single call site to a typed interface, port one module's dependency graph. Each task has an interface a validator can inspect. Each validator is, in Stein's words, "programmatically rigid," meaning the same checks run the same way on the same inputs and return a pass/fail a human or downstream agent can trust. The talk is explicit that this is the layer designed to contain LLM hallucinations rather than suppress them.
The speed claim is downstream of that discipline. Weeks instead of years is what falls out when the work is decomposed into many parallel tracks, when each track's acceptance criteria are machine-checkable rather than judgment calls, and when the validation loop is fast enough to keep agents productive. Stein's talk credits the model with being one component of the result, not the driver. Any team trying to replicate the outcome on a different codebase will have to solve the decomposition and validation problems first, and may find that the model is the cheapest piece of the system to swap.
For an engineering team evaluating this approach, the replication checklist is roughly: identify work units small enough to be unit-test-shaped; write deterministic validators for each unit before deploying agents; instrument pass/fail rates per validator; reserve humans for the small set of failures the validators cannot classify. ServiceTitan's own scale and the specific shape of its codebase matter, and Stein is presenting a practitioner account rather than publishing reproducible benchmarks. The pattern is what transfers; the exact timeline almost certainly does not.
The honest limits are visible in the source itself. The talk is a 50-minute QCon AI presentation, hosted on InfoQ, by a ServiceTitan engineer describing ServiceTitan's approach. No independent benchmark, public dataset, or third-party audit accompanies it. The hallucination risk the validation loops are designed to contain is real and well documented elsewhere. The talk's contribution is the engineering wrapper, not a claim that the underlying problem has been solved. Other teams reading this should adopt the pattern, not the timeline.