Agentic AI is becoming the backbone of cloud workloads, and right now it is carrying an efficiency tax that nobody has been able to dial back. A team from MIT and Microsoft says it has built a system that can dial that tax back automatically, by picking the right combination of models, tools, and hardware for every job.
The research, announced by MIT News on June 25, 2026, is built around a concrete diagnosis. An "agentic workflow" is a multi-step AI task that strings together several models and external tools, such as a database lookup or a Python interpreter, to complete a job. Cloud providers have been deploying them rapidly, but the default way of running them is wasteful. Lead author Gohar Chaudhry, an MIT EECS graduate student, put it this way: "agentic workflows are becoming the backbone of cloud providers' work, energy use is a major concern, and giving the cloud provider resource-optimal orchestration is a win for everyone."
The waste comes from a familiar pattern. Operators tend to over-allocate compute, picking the largest model, the most expensive hardware, and the most generous resource budget, just to be safe. The result is that most agentic jobs run on more compute than they actually need, and the unused cycles still cost money and electricity.
The new system, called Murakkab, attacks the problem at the deployment layer rather than the model layer. A developer describes what they want in plain language. Murakkab then picks the right combination of models, external tools, and hardware, and decides how much cloud capacity to allocate when the job actually runs. If the operator changes their mind and wants the job faster or cheaper, the system re-tunes the configuration on the fly based on the new priority.
In the paper behind Murakkab, accepted at the USENIX Symposium on Operating Systems Design and Implementation (OSDI), the team reports that across multiple agentic workloads, the system cut the number of compute units needed to run them with no loss in performance. The MIT release describes the gains directionally; the specific percentages and unit reductions are detailed in the full paper and the OSDI PDF on the lead author's site.
The authorship mix is a signal of where the work is aimed. Chaudhry is a graduate student in MIT's Department of Electrical Engineering and Computer Science. His co-author Adam Belay is an MIT EECS associate professor and a member of the Computer Science and Artificial Intelligence Laboratory. The senior author, Ricardo Bianchini, is a technical fellow and corporate vice president at Microsoft Azure, and other co-authors are also at Azure. The problem the paper targets is operational, not theoretical: real cloud deployments of agentic workflows tend to over-provision resources, and the fix needs to live at the layer where those resources are allocated.
What changes if the approach holds up outside the lab is the cost shape of agentic AI. Today, every conversation an agent has with a database, every Python script it runs, and every retry it triggers runs on whatever hardware was reserved up front. With orchestration that adjusts in real time, the same workload can use a smaller model for an easy step and a bigger one for a hard one, and can shift the whole job to a cheaper machine when the user cares about cost more than latency.
The team frames Murakkab as a deployment efficiency win, not a smarter-agent breakthrough. The agents themselves are not getting better at their jobs. The hardware and the budget they run on are getting more honest. The next test is whether the same mechanism survives contact with workloads and customer priorities that the paper has not yet measured, and whether cloud providers can be persuaded to ship the orchestration layer as a default rather than an option.