Databricks' open bet on agents: one data foundation for every AI worker

Databricks' open bet on agents: one data foundation for every AI worker — type0 | type0

PREVIEWDatabricks' open bet on agents: one data foundation for every AI worker · MD

The first time an enterprise wires an AI agent into its customer database, the data team usually spends a week building the agent a private copy of the data, a separate identity for it, a logging pipeline for its queries, and a way to revoke all of it when something breaks. By the third agent, most teams are running a small zoo of these pipelines. By the tenth, the data plumbing has become the single largest line item in the agent roadmap.

This is the bottleneck that Matei Zaharia, chief technologist at Databricks and the creator of Apache Spark, wants to redefine. In a wide-ranging conversation on Latent Space recorded at the 2026 Data + AI Summit, Zaharia and Databricks cofounder Reynold Xin laid out a thesis that runs against the grain of the current AI-agent discourse: the agent era will not be won in the models. It will be won at the substrate.

The argument runs like this. Coding agents, analyst agents, customer-service agents, and the long tail of vertical AI workers are about to hit the same four problems that enterprise software has been failing to solve for two decades: portability between vendors, shared session history, real security and audit trails, and a way to keep compute spend from running away. Zaharia's view, articulated repeatedly in the interview, is that those problems are not going to be solved inside any single agent. They will be solved at the layer underneath the agents, in a shared data substrate and a shared control plane.

That is the strategic frame behind three Databricks announcements that, taken individually, look like standard product news. Taken together, they sketch the architecture of the bet.

Omnigent, released as open-source software under the Apache 2.0 license on GitHub and introduced on Databricks' blog, is the agent-harness piece. It is a meta-layer that sits above individual coding and reasoning agents (Claude Code, OpenAI's Codex, Cursor, Pi, and any custom agent SDK) and handles the cross-cutting concerns: live collaboration between humans and agents, multi-device session continuity, policy controls such as pause-for-approval and tool limits, and spend caps. Zaharia's launch announcement on X frames it as the missing control plane for the agent era. Cloud sandboxes for Omnigent run on Modal, Daytona, and Islo, which is a deliberately vendor-neutral choice at the infrastructure level.

LTAP, or Lake Transactional/Analytical Processing, is the data-substrate piece. Databricks calls it the first architecture that unifies online transaction processing (OLTP) and online analytical processing (OLAP) on a single lake copy. The traditional approach to unifying transactional and analytical workloads is change data capture (CDC), which Xin described on Latent Space as "continuous data corruption" once workload isolation collapses. LTAP instead splits transactional and analytical workloads at the storage layer so they can scale independently, while sharing one governed copy of the data underneath. The press release cites Lakebase, Databricks' transactional database built on Postgres (the open-source database standard) alongside the Iceberg and Delta open table formats, as the production instance, and claims it is already serving thousands of customers and handling 12 million database launches per day. Both of those numbers are vendor-reported.

Genie is the proof point that the substrate thesis can produce agent behavior, not just infrastructure. It is Databricks' data-analysis agent, and the engineering team's blog post describes a multi-step design (specialized knowledge search, parallel thinking, a Multi-LLM architecture) that took its accuracy on Databricks' internal benchmark from 32 percent to above 90 percent, with Zaharia claiming a roughly 3x accuracy advantage over generic coding agents on certain tasks. The benchmark is internal and the comparison class is narrow, so the right read is that the agent got dramatically better inside Databricks' own environment, not that it has lapped the field.

The deeper claim, though, is cross-cutting. Coding agents and enterprise agents, in Zaharia's framing, hit the same problems for the same reasons. They are stateless in the wrong places, they cannot share context, they cannot be governed, and they cannot be cost-controlled. The fix is not a better agent. The fix is a shared substrate: an open agent-harness API on top, an open data layer underneath, and open formats (Iceberg and Delta for tables, MLflow for machine-learning workflows, DSPy for programming language models) as the connective tissue. That is the architectural shape of the "frontier ecosystem" framing Satya Nadella has been using in public remarks, and it is also a competitive bet. The company that owns the substrate owns the agent era.

That last sentence is where the critique lives. "Open frontier ecosystem" is also Databricks' market position. The company is now reportedly valued at $175 billion, a figure that originates with the Latent Space conversation's intro framing rather than an audited filing. Open-sourcing Omnigent under Apache 2.0 is genuinely unusual for a company in Databricks' position. So is basing LTAP on the trio of Postgres, Iceberg, and Delta, three open standards, none of which Databricks owns outright. But the substrate thesis is also exactly the layer where Databricks competes most directly with Snowflake, Google BigQuery, and the hyperscalers' native agent stacks. Calling that layer "open" is not the same thing as it being unowned.

The watch item, then, is whether the open parts of the bet actually pull in independent ecosystem adoption, or whether they end up functioning as a defensible moat around Databricks' own lakehouse. The 12 million daily Lakebase launches are real Databricks usage, not open adoption. The GitHub star count on Omnigent in the months after release is a cleaner signal. So is the third-party agent count running on Omnigent versus on a competitor harness. By the time of the 2027 Data + AI Summit, those numbers will be the only ones that matter for the substrate thesis.

Databricks' open bet on agents: one data foundation for every AI worker

Sources