The Pricing Cliff Hiding in Google I/O's Flash Tier

The Pricing Cliff Hiding in Google I/O's Flash Tier — type0 | type0

PREVIEWThe Pricing Cliff Hiding in Google I/O's Flash Tier · MD

Google spent much of its I/O 2026 keynote on stage conversations — filmmakers, scientists, and technologists talking about what AI means for their fields. Fine. But behind the Dialogues stage, Google did something more consequential: it tripled the price of the model developers use most.

Google DeepMind Blog launched Gemini 3.5 Flash on May 19 at $1.50 per million input tokens and $9 per million output tokens. According to Simon Willison, that is three times what Gemini 3 Flash Preview cost, and six times what Gemini 3.1 Flash-Lite cost. Artificial Analysis confirms it: 5.5 times costlier to run than 3 Flash on their benchmark suite. This is not a minor adjustment. It is a repricing of the budget tier that hundreds of thousands of developers built their workflows around.

For a developer running 10 million input tokens per day through an AI coding assistant, that translates to roughly $450 per month at the new rate versus $150 at the old one — a $300 monthly jump, or $3,600 per year, before output token costs. For a startup running multiple agents across a development team, the delta compounds fast.

Here is the part that should get more attention: Google is also rolling Gemini 3.5 Flash into its free consumer products. The same tier. At the new higher price. Google is absorbing the increase on the free side while developers pay the premium via API. That is the pricing architecture of a company that has decided the cheap era of its own Flash model is over.

The capability jump is real. On Terminal-Bench 2.1, which tests how well models handle command-line and devops tasks, Gemini 3.5 Flash scored 76.2% versus Gemini 3.1 Pro's 70.3%. On MCP Atlas, a benchmark for model context and agentic workflows, it hit 83.6% versus 78.2%, according to LLM Stats. It runs at 278 output tokens per second — roughly four times faster than GPT-5.5 or Opus 4.7 at comparable intelligence levels, according to Appwrite's benchmark analysis. Google DeepMind's own blog calls it "our strongest agentic and coding model yet."

But the developer community on Hacker News is noting a crack in that framing. Gemini 3.5 Flash is exceptional at one-shot coding — a hard problem solved in a single prompt. What it does less well, observers note, is long-horizon agentic tasks with arbitrary tool availability. The kind of multi-step, error-correcting loop that agentic workflows actually require. One HN commenter with model-serving expertise estimated the model at 250-300 billion total parameters with 10-16 billion active — a sparse mixture-of-experts architecture running on TPU 8i. If accurate, that explains the speed but also the boundary: tasks requiring sustained dynamic tool use may still trip it up in ways a larger dense model does not.

The benchmark scores also deserve scrutiny. On Appwrite's analysis of the Artificial Analysis benchmark suite, Gemini 3.5 Flash scores 1.9 points lower than Gemini 3.1 Pro on the Intelligence Index — 55.3 versus 57.2 — while costing 42% more per evaluation run. The model Google is charging more for is, by that measure, slightly less capable than the one it replaced at the lower price point.

None of this means the price hike is unjustified. Google is now processing 3.2 quadrillion tokens per month, up sevenfold from 480 trillion a year ago, according to numbers Google shared at I/O. That is a compute arms race being won at the infrastructure level, and infrastructure costs money. If the model is genuinely better at the tasks developers actually care about, the market may accept the premium.

Whether the rest of the industry follows Google upward is the real open question. OpenAI's GPT-4o-mini and Anthropic's Haiku-tier models remain priced lower than Google's new Flash rate, according to Investor's Business Daily's analysis of current public pricing — which makes Google an outlier repricing upward for now. If competitors hold, Google's developers have a clear incentive to shop around. If they follow, the era of cheap AI API access closes for everyone.

The Dialogues stage made for good theater. The pricing table is what matters.

The Pricing Cliff Hiding in Google I/O's Flash Tier

Sources