A generation ago, software development was plagued by "works on my machine" failures until Git introduced content-addressed storage — every version stored by what it contains, not where it lives, turning version control into a permanent, verifiable ledger. Machine learning has the same problem, at a larger scale: experiments that cannot be reproduced, pipelines that rerun everything when only one step changes, and compute wasted on duplicating work that has already been done. Shopify has been building a fix — and on a Latent Space podcast interview published today, its CTO Mikhail Parakhin disclosed the results.
The internal numbers are concrete. Shopify says its automated research loop, Tangent, drove a fivefold increase in search query throughput — from 800 queries per second to 4,200 — on the same hardware, by finding code-level inefficiencies in the serving stack that humans had missed, according to Parakhin on the Latent Space podcast. Daily active AI tool users have approached 100 percent of engineers since December 2025, when a step change in model quality triggered a phase transition in internal adoption. PR merge rates grew from 10 percent month-on-month to 30 percent since AI coding tools went wide, and code complexity per merge is increasing, meaning each change being merged is larger and more ambitious, not just more frequent. The company says it has saved more than a year of cumulative compute time since adopting its ML workflow engine, Tangle, which uses content-addressed caching so that when one step in a pipeline changes, only that step reruns rather than the full workflow.
The merchant-facing product built on top of this stack is SimGym, which Shopify opened to all eligible merchants on March 11, 2026, without a waitlist. SimGym runs AI agents through a merchant's storefront to simulate customer behavior and predicts whether proposed changes will lift conversions. Shopify reports that its recommendations correlate at 0.7 with real add-to-cart events — a figure it spent nearly a year optimizing and that it believes competitors cannot easily replicate without decades of real purchase history. The App Store rating for SimGym is 2.9 stars based on eight reviews; one merchant reviewer flagged that simulated traffic from SimGym's AI agents was contaminating their Google Analytics conversion data, making it harder to distinguish simulated performance from actual customer behavior.
Whether that skepticism is warranted is genuinely unclear. For large merchants with abundant real traffic, the marginal value of a simulation that is right 70 percent of the time over a direct A/B experiment is debatable. For small merchants who cannot run statistically significant tests at all, even an imperfect simulation may be the most useful signal available. Parakhin's clearest argument for why SimGym cannot be easily copied is data: "Who else would have that data?" he asked on the podcast, referring to the purchase history needed to train a simulation that correlates with real behavior. The implication is that SimGym works because Shopify has been collecting commerce data for decades — a competitor launching today could not replicate it by writing better code alone. Whether the skepticism softens as more merchants use SimGym and whether the 2.9-star reception improves are the questions to watch over the next several months.
Tangle and Tangent are available to ML practitioners via GitHub and have attracted external adoption, Parakhin said. The broader lesson from Shopify's internal numbers may be that the real bottleneck in AI-assisted coding is no longer code generation — it is review, CI/CD, and deployment stability. As models write more code faster, the constraint is how quickly human operators can evaluate and absorb it.