The hidden storage tax on AI infrastructure
Only about 28% of AI projects deliver meaningful ROI, Gartner says. A new explainer argues the bottleneck is usually the storage rack sitting next to the GPUs.
Only about 28% of AI projects deliver meaningful ROI, Gartner says. A new explainer argues the bottleneck is usually the storage rack sitting next to the GPUs.
GPUs bill by the hour, but the storage system sitting next to them in the rack decides how much of that hour actually becomes useful AI work. That gap is one reason so many AI infrastructure projects never earn back their investment: the compute is paid for, but it is not fully used.
According to Gartner's April 2026 infrastructure and operations research, only about 28% of AI infrastructure and operations projects deliver meaningful return on investment. Most stall before they ever pay back. The money is real, the GPUs are real, the data is real. What is missing is the throughline that turns capital spending into productive inference and training throughput. A Register explainer argues that the storage stack is one of the most common places that throughline breaks.
The mechanism is unglamorous. A modern AI training run does not just read a dataset once. It reads the same data many times across many passes, streams checkpoints (the periodic snapshots that let a long job resume if a node crashes), and shuffles parameter updates across many GPUs working in parallel. Each of those operations is an input/output request, and every I/O request has to land somewhere on disk before the GPU can keep going. If the storage layer cannot feed data fast enough, the GPU sits idle. Idle GPU time is not refunded, even though the rack still draws power and the cluster still shows up on the invoice.
The financial shape of that mismatch is what makes it a planning problem rather than a tuning problem. A GPU rack that costs several million dollars and runs at a fraction of its rated utilization is not returning the ROI the budget assumed. The Register's coverage of the Gartner findings framed the same pattern from the demand side: AI infrastructure spending keeps climbing, but the share of projects that produce measurable business outcomes has not kept pace. The bottleneck, in other words, has moved. It is no longer mostly about whether a team can buy enough GPUs. It is about whether the rest of the stack can keep those GPUs busy.
What "GPU-starved" looks like in practice is a queue. During training, GPUs issue read requests for the next batch of training examples. If the storage system answers slowly, the batch waits. During checkpointing, every GPU has to coordinate a write across the whole cluster, and a slow storage tier turns that into a multi-minute pause where nothing else runs. Inference workloads have a different version of the same problem: latency-sensitive requests pile up while the system reads model weights, prompt context, and lookup tables from a slow medium. None of this shows up as a storage outage. It shows up as a utilization dashboard that never quite hits the number the capex model assumed.
The architectural choices that close the gap are not magic fixes; they are tradeoffs. Parallel file systems spread data across many disks and many servers so reads and writes can happen in parallel, which raises throughput at the cost of operational complexity and licensing. NVMe fabrics, a way of extending fast-flash storage across a network of devices rather than just within one server, raise throughput further at a meaningful price tag. Tiering keeps hot data (the training corpus or the most-used model weights) on fast media and pushes cold data to cheaper bulk storage, which controls cost but only works if the data placement is right. Kernel-bypass networking lets the GPU talk to storage with less CPU overhead, which helps in latency-sensitive inference paths but adds engineering burden. None of these is automatically the right answer. Each is a different point on the curve between cost, complexity, and how much of a paid-for GPU hour actually becomes useful AI work.
That is the question worth taking into a planning conversation. Not "what storage should we buy" but "what fraction of our most expensive hardware is currently doing useful work, and what is the architecture that gets that fraction closer to the number we modeled." The projects that close the gap tend to start by admitting the gap exists, in concrete utilization terms, before they sign the next storage purchase order.