Goldman Is Bullish on AI Cash Flow. Its Own Chip Forecast Tells You When.
The bank's 24x token consumption forecast rests on a cost curve the same report says won't kick in for another 12 to 18 months.
The bank's 24x token consumption forecast rests on a cost curve the same report says won't kick in for another 12 to 18 months.
Goldman Sachs Research is telling investors that AI agents will multiply global LLM token consumption roughly 24x by 2030, that inference costs are falling 60 to 70% per year, and that hyperscaler margins are about to inflect. The same research note, read past the headline, says high-end semiconductors will be in short supply for the next 12 to 18 months. That gap is the story.
The bullish case, in Jim Schneider's equity research note for Goldman Sachs Research, is straightforward. Token volumes are about to explode as agentic AI moves from chatbot to autonomous task execution. Schneider, a senior equity analyst at the bank, models global LLM token consumption rising from roughly 5 quadrillion per month today to about 120 quadrillion per month by 2030. The consumer side accounts for about 12x of that growth (online shopping agents, smartphone-takeover assistants), with enterprise adoption layered on top. Daily LLM queries, in his model, grow at roughly a 40% CAGR to about 11 billion per day by 2030.
The mechanism Schneider uses to get from "more tokens" to "more cash flow for tech" runs through unit economics. Inference cost per token is declining 60 to 70% per year as chip designs and data-center architecture improve. Falling unit cost plus rising volume equals rising gross margin. Rising gross margin gives hyperscalers (the cloud platforms whose capex buildout is the most visible bet on AI) the operating cash flow headroom to keep spending on chips, data centers, and power. Schneider frames the next 3 to 12 months as the window for that margin inflection to show up in earnings.
This is where his own chip-supply analysis gets in the way. Schneider flags a 12 to 18 month shortage of high-end semiconductors, with full catch-up taking about two years. The reason is capacity lag: the industry's fab capacity was sized for demand as it looked six months ago, while agentic AI use cases have moved faster than the supply curve. The bottleneck sits directly on the path between "falling inference cost" and "rising hyperscaler cash flow."
Goldman's research is doing something the trade press coverage has flattened. The wire version reads "AI agents will boost tech cash flow." PYMNTS's coverage of the same report leans into the token-volume headline and adds its own survey data showing enterprise agentic AI moving from piloting to production through 2026. Both treatments are accurate. Neither emphasizes the sequencing problem Schneider himself puts on the page.
Schneider is not contradicting himself. He is laying out a plan, not a punchline. The free cash flows of hyperscalers, in his own words, have been compressed by the capex required to build out AI infrastructure. What fixes that is the unit-economics math above. The 12 to 18 month chip gap is the controlled delay inside that plan: short enough that the long-run thesis survives, long enough that anyone timing capex, contracts, or adoption in 2026 needs to read the whole report.
Three audiences get a different clock from this story.
Investors watching hyperscaler capex cycles should treat the next 12 to 18 months as a margin-stress window, not a margin-expansion window. Schneider's 3 to 12 month inflection depends on inference costs falling fast enough to offset the cost of constrained supply. If chip allocations favor one buyer over another, or if power and permitting bottlenecks extend the timeline, the inflection slips.
Enterprise buyers timing agentic AI adoption should expect the cost model Goldman projects to lag their contracts by roughly a year. Schneider's 60 to 70% annual cost decline is a supply curve, not a price quote. The buyer who signs an agentic-AI platform deal in early 2026 is buying at today's supply-constrained cost, not at the 2027 unit price the Goldman model assumes.
Policymakers tracking AI infrastructure demand get a more concrete map than the usual trend line. Power utilities, permitting authorities, and water-use regulators facing data-center siting decisions can read Schneider's two-year supply catch-up as a real planning horizon. The 24x token-growth number is not just a market-size story. It is a power, land, and water story with a 12 to 18 month leading indicator.
The numbers in the Goldman note are attributed forecasts, not market data. Schneider's projections (24x tokens by 2030, 60 to 70% annual cost decline, 12% knowledge-worker adoption by 2030 rising to 37% by 2040, 40% query CAGR) come from a single published note. The mechanism inside them is internally consistent. The timing inside them is the part the headline doesn't foreground.
The thing to watch next is whether the 3 to 12 month margin inflection shows up in hyperscaler Q3 and Q4 earnings, or whether the chip-supply window pushes it into 2027. If earnings call transcripts start using the phrase "supply-constrained inference" the way they used "supply-constrained GPUs" in 2023, the Goldman timing problem has just become the market's.