The Real Story Hiding in Google I/O’s Most Important Panel
The Real Story Hiding in Google I/O's Most Important Panel
The chatbot got all the coverage. The infrastructure argument deserved better.
Three days after Google wrapped I/O 2026, every headline was about Gemini 3.5 Flash, the new model, the redesign, the features. The actual interesting technical conversation happened the next day, in a panel that almost nobody made the lede.
The panel was called "Defining the agentic AI era." It featured Jeff Dean, Google's chief scientist; Koray Kavukcuoglu, its DeepMind CTO; Liz Reid, who runs Search; and Josh Woodward, who runs the Gemini app and Google Labs. The Google for Developers YouTube video runs forty-one minutes. It is the most honest technical discussion Google has given publicly in years.
The thing Dean said that nobody else picked up was this: "If you make the model infinitely fast, Amdahl's law says if you're spending half your time in tools, you're not going to get anything better than 2x speedup."
Amdahl's law is a formula from parallel computing that describes the theoretical limit on how much you can speed up a system by improving one component. Dean's point, stated plainly, was that as Google makes its models faster — and they are making them much faster — the bottleneck is shifting to the tools those models call. File systems. Databases. APIs. Browser state. The infrastructure that humans built to interact with at human speed.
Dean offered a concrete example. Google had been running its own internal engineering tasks through AI models. The models were fast. The Python scripts they were calling were slow. So Google used AI to rewrite those Python tools in Go. The rewrite took, by Dean's account, a single night of automated work. The result was a ten-to-twenty-times speedup.
That is the I/O 2026 story nobody wrote.
The infrastructure bet nobody asked about
Sundar Pichai said on stage that Google expects to spend $180 to $190 billion in capital expenditures this year. In 2022, that number was $31 billion. That is a six-times increase in four years. The framing from Google's communications team is that this buys more compute, more capacity, more model training.
Dean's panel suggests a different read. The compute is necessary but not sufficient. The tools have to come with it. Google's infrastructure bet is not just on the models — it is on the entire stack underneath them, including the parts that have been largely ignored in the AI conversation because they are not glamorous and they do not fit on a benchmark slide.
Kavukcuoglu reinforced this in describing what the 3.5 Flash team actually optimized. Per Google's announcements post, "The biggest thing that we really wanted to focus on was getting coding and agentic workflows much, much better," he said. Not raw intelligence. Not a better benchmark score. Coding and agentic workflows — the thing the model does when it calls other tools to get work done.
The scale of what Google is running is not abstract. Pichai also disclosed that Google is now processing over 3.2 quadrillion tokens per month — a 7x increase from roughly 480 trillion a year earlier. Those numbers are not in the keynote talking points. They ended up in a blog post footnote.
The Spark design decision that tells you everything
Woodward described Gemini Spark, the always-on background agent Google is building into the Gemini app and Workspace. It runs tasks when you're not watching. It sends you a notification when it's done. It will draft an email response, label it high-priority, have it ready for your review.
Then Reid, the Search lead, cut in with what sounds like a legal disclaimer but is actually the most important product design statement of the week: "Very important, don't send the email."
Spark does not act. It prepares. It drafts. It waits. The human approves.
This is not a limitation of the technology. It is a deliberate choice about where to put the agency boundary in an agentic system. Google is building a world where AI runs continuously in the background, but the accountability loop stays with the person. That is a specific architectural decision with significant implications for how enterprise software, legal liability, and human oversight will work as agentic systems become infrastructure.
The latency model that changes what "fast" means
Reid described a new way to think about search speed that also explains the broader agentic design philosophy. The old model was: faster is always better, sub-second response, minimal latency. The new model is: it depends on what the user was going to do with their time.
"If the user would have spent fifteen or twenty minutes doing it, you can have ten seconds, for sure, if you can do something amazing," she said. The example she gave was a weekend trip planner — a task where waiting a minute is acceptable if you were going to spend twenty minutes doing it manually.
This reframes the optimization target. Google is no longer engineering for single-response latency. It is engineering for task-completion time, with the understanding that the user is not sitting and waiting for the AI to finish. They have moved on. They will come back.
The tool bottleneck is not theoretical
Kavukcuoglu was direct about what comes next. Google's internal engineering environment is heavily automated, but it was built for human cadence — humans write code, humans review it, humans deploy it. Agents operating at machine speed will need that infrastructure re-engineered. "Tools get faster, and then the models work better," he said. "And then that is another cycle."
The implication is that Google has identified its own tooling as a strategic risk. The company that has spent the last four years convincing the world that it has the best models is now quietly saying that the next bottleneck is not the models. It is everything around them.
That is worth writing about.