A San Francisco startup called Gimlet Labs has raised an $80 million Series A to solve one of the most persistent inefficiencies in AI infrastructure: the gap between where models run and where they run best.
Gimlet Labs, which emerged from stealth five months ago, builds software that lets frontier AI models run across different chip architectures simultaneously — splitting a single model across NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix hardware, using whichever chip is fastest for each portion of the computation. The company disclosed the round, which brings its total raised to $92 million, in a press release published Monday via GlobeNewsWire. Menlo Ventures led the round, with participation from Factory, Eclipse, Prosperity7, and Triatomic. Several notable angels also participated, including Bill Coughran of Sequoia Capital, Stanford professor Nick McKeown, former VMware CEO Raghu Raghuram, and Intel CEO Lip-Bu Tan, per TechCrunch.
The pitch is straightforward: the AI industry is leaving enormous amounts of hardware capacity on the table. According to TechCrunch, applications are currently using hardware already deployed somewhere between 15 and 30 percent of the time — wasting hundreds of billions in idle resources at a moment when the industry is on pace to spend $650 billion on AI data center capital expenditures this year. Gimlet's software is designed to close that gap by matching computational workloads to the chip best suited for them, rather than locking inference to whatever hardware a customer happens to own.
The technical foundation comes from research published last year. In July 2025, Gimlet cofounders Asgar, Nguyen, and Katti posted a preprint to arXiv (paper 2507.19635) finding that a heterogeneous combination of older-generation GPUs paired with newer accelerators can deliver comparable total cost of ownership to the latest-generation homogeneous GPU infrastructure. The paper is the most technically rigorous source in the mix — it's independent academic work, not a press release.
Gimlet claims the approach reliably delivers three to ten times the inference speed for the same cost and power budget. That is a significant claim. It is also a company-sourced claim. No independent benchmark has been published yet. The utilization figure — that AI applications use deployed hardware only 15 to 30 percent of the time — carries the same caveat: it comes from the company's own framing. Readers should hold both numbers loosely until someone runs the test in public.
The team has a track record in adjacent territory. The cofounders — Asgar, Nguyen, Azizi, Serrino, and Bartlett — previously built Pixie Labs, an open-source observability tool for Kubernetes that New Relic acquired in 2020 just two months after Pixie closed a $9 million Series A led by Benchmark. Gimlet currently employs 30 people and counts one of the top-three frontier labs and one of the top-three hyperscalers among its customers, according to the press release. The company says it is already generating eight-figure revenues.
The broader context is a quiet but real shift in how the industry thinks about AI hardware. The dominant narrative for the past two years has been: buy more GPUs. Gimlet is betting that the harder problem — and the cheaper one — is using what you already have, better. Whether the economics actually work at scale, and whether the company can deliver consistent results across heterogeneous hardware without the integration headaches that have historically plagued multi-vendor AI deployments, are the questions that will determine whether this round marks a real inflection or a well-funded press release.
Gimlet Labs is not the only company chasing the heterogeneous compute angle. The idea of mixing chip architectures for AI workloads has been floating around systems research for years; the arXiv paper gives it academic grounding. What the funding and the customer logos suggest is that at least one top-tier frontier lab and one hyperscaler are taking the bet seriously enough to sign contracts. That is worth watching — not as confirmation of the speedup claims, but as a signal that the industry's hardware-mix optimization problem is real enough to write checks for.