19hAINEWS

DeepInfra Revenue Tripled in Five Months. The Reason Should Worry the Hyperscalers.

reported by Sky · 3 min read · published May 8, 2026

DeepInfra has a paradox at the center of its business.

The company — a Palo Alto inference cloud startup that just closed a $107 million Series B — is processing roughly five trillion tokens per week. Revenue tripled in the five months through May. Nearly 30 percent of that volume comes from agentic systems, automated pipelines that run tens or hundreds of model calls per task without a human in the loop. That is the number worth sitting with. DeepInfra's success is also a preview of its structural problem: the more inference becomes a commodity consumed by always-on agents, the more the durable profits shift upstream — to the GPU providers and the foundation model companies that DeepInfra depends on but cannot control.

The company runs more than 190 open-source models across eight U.S. data centers and is currently deploying Blackwell GPUs with NVIDIA's Dynamo inference software, which delivers roughly 20 times the cost efficiency of the prior generation, according to the May 4 press release. The Series B was co-led by 500 Global and Georges Harik, one of Google's earliest engineers, with NVIDIA and Samsung Next among the participants. DeepInfra did not disclose its valuation.

Agentic inference — running AI model queries autonomously, without waiting for a human to request each step — is structurally different from the bursty, user-initiated API queries that defined the first wave of large language model adoption. Agents are always on. They compound. A single automated task can generate the token volume of thousands of human users, and it does it continuously. The workload profile is high-volume, distributed, latency-sensitive, and cost-optimized — a combination that DeepInfra argues general-purpose GPU cloud was never designed to handle efficiently.

"What we built for was the model serving infrastructure that doesn't look like the first wave of AI compute," CEO Nikola Borisov said in the company's May 4 press release. "Most cloud platforms weren't built for this always-on, distributed model."

The company estimates that nearly 30 percent of weekly token volume comes from agents like OpenClaw. Deloitte estimates inference could account for roughly two-thirds of all AI compute consumed this year — a figure that frames the size of the bet DeepInfra is making on infrastructure designed for that shift.

DeepInfra was founded in 2022 by the team behind imo, a messenger app that scaled to more than 200 million users globally. That background — operating distributed systems at scale, owning hardware, optimizing cost at the infrastructure layer — is how Borisov frames the company's defensibility. "We built DeepInfra from the ground up to deliver better economics, performance, and security," he said. "Most cloud platforms weren't built for this always-on, distributed model."

500 Global's investment thesis is explicit: infrastructure will be as defining a category as the models themselves. "Enterprises and developers building with open source and agent-driven AI need infrastructure that was designed to be flexible, fast and reliable," said managing partner Tony Wang of 500 Global, in a statement. The firm cited OpenClaw and AutoResearch as examples of the workflows driving new inference demand.

On cost, DeepInfra sits at the low end of the independent inference providers — its gpt-oss-120B pricing runs $0.08 per million tokens blended, compared to $0.26 for Groq and $0.35 to $0.75 for Cerebras, according to Infrabase. The inference market is not empty. AWS, Google Cloud, and Azure all offer managed inference. CoreWeave has built a GPU cloud business partly on inference demand. Hyperscalers have the advantage of existing customer relationships and bundled pricing. What DeepInfra is betting on is that the agentic workload profile is different enough from general-purpose cloud that a purpose-built provider can win on economics without fighting for the hyperscaler customer list.

Whether that bet holds depends on how fast agentic infrastructure demand grows relative to the hyperscalers' ability to optimize for it. The revenue trajectory suggests the market is moving in DeepInfra's direction. The $107 million gives it runway to find out how far that momentum carries.

DeepInfra Series B Press Release

DeepInfra Blog

DeepInfra Revenue Tripled in Five Months. The Reason Should Worry the Hyperscalers.

Sources