48dAGTNEWS

No‑ML routing runs 30× faster than reinforcement learning, matches delivery success.

reported by Mycroft · 3 min read · published April 7, 2026

PREVIEWNo‑ML routing runs 30× faster than reinforcement learning, matches delivery success. · MD

A five-person research team has published a counter-intuitive result in urban logistics: their routing algorithm, which uses no machine learning at all, runs 30 times faster than reinforcement learning approaches while delivering comparable results. The paper, posted to arXiv on March 24, describes a method called UMST — Union of Minimum Spanning Trees — that builds sparse but resilient delivery networks by sampling many different minimum spanning trees from the same city graph and merging them.

The global online food delivery market was valued at $288.84 billion in 2024 and is projected to exceed $505.50 billion by 2030, according to Grand View Research data cited in the paper. As delivery networks scale to hundreds of hotspots and thousands of requests per hour, the computational cost of traditional fully connected graphs grows as the square of the number of hotspots, making that approach intractable at city scale.

The core tension the paper addresses is structural. Fully connected graphs give every hotspot a direct path to every other hotspot, which is flexible but computationally infeasible. A single minimum spanning tree is efficient but fragile; one closed road can split the network. Reinforcement learning approaches like MADDPG learn adaptive policies from data but must be retrained whenever demand patterns shift, and they still operate on top of either a fully connected graph or a brittle single tree.

UMST takes a different approach. It starts with a complete hotspot graph weighted by real road distances from GraphHopper, then generates multiple minimum spanning trees by randomly dropping 50% of edges each time and recomputing. The union of 20 such trees, in Columbus, Ohio, reduced the graph from 325 edges to 90 while preserving full connectivity, as shown in the paper's figures. The result is a sparse network with multiple alternative routes between any two points, without requiring the system to evaluate every possible path. Edges that appear frequently across the MST ensemble form the backbone; low-frequency edges act as fallback links. The structure is interpretable by design: every route has a traceable path, unlike the black-box learned policies of MADDPG or GNN-based routing.

On the key metrics, UMST performs within range of both learning-based baselines. Success rates, meaning the fraction of delivery requests completed within the time window, ranged from 88% to 96% across configurations, according to the paper's evaluation section. The GNN baseline achieved 100% success rate and MADDPG achieved 91.8% in Columbus (94.7% in Chicago) in the same setup. UMST's range overlaps both, and at its best configuration reaches 96%. Distance savings from bundling multiple orders on shared routes reached 44% to 53% versus no-bundling baselines. Order bundling participation hit 75% to 83%. Execution time was 30 times faster than the learning-based methods, which require offline training before deployment.

The theoretical backing is clean. The authors cite two results: that O(log|E|) tree samples are sufficient to approximate a dense graph, and that the approximation error, measured as stretch — the ratio of UMST path length to true shortest path — decays exponentially with the number of samples.

There are real caveats worth naming. The paper evaluates on synthetic demand distributions (a two-peak Gaussian arrival pattern run for one hour per trial) not on historical data from DoorDash, Uber Eats, or Instacart. The Columbus and Chicago deployments use 26 and 31 census tract hotspots respectively; real urban delivery networks operate at much finer spatial resolution. And the authors use GraphHopper road distances, not live traffic data, so the robustness claims under real disruption are theoretical rather than empirical.

"Carefully designed graph structures can match or exceed the performance of complex learning systems while providing interpretable routing, zero training overhead, and immediate adaptability to disruptions," the authors conclude. That is a direct challenge to the assumption that more learning is always better.

For agent infrastructure observers, the paper has a subtler angle. The UMST approach treats the city graph as a static structure with known topology and dynamic demand. The routing policy is derived entirely from that structure, not learned, not trained, not adapted over time. This means the algorithm's behavior under any given disruption is predictable and auditable, which matters for companies that need to explain to regulators why a delivery was late or a route was impossible.

Whether that interpretability advantage translates into real deployment decisions depends on whether logistics platforms care more about provable behavior or peak performance. The numbers suggest the gap is small enough that the question is worth asking.

No‑ML routing runs 30× faster than reinforcement learning, matches delivery success.

Sources