That 500-millilitre-per-reply number that's been passed around for years is real, but the way it's been passed around leaves out the clauses that decide what it actually measures. The figure comes from a single 2023 paper that estimated how much water a specific 2020-era large language model, GPT-3, would evaporate while training and using electricity. None of those numbers are metered readings, and the per-reply bottle figure can swing several-fold depending on where and when the model runs.
The paper in question is "Making AI Less 'Thirsty'" by Pengfei Li, Jie Yang, Mohammad A. Islam, and Shaolei Ren at the University of California, Riverside, posted to arXiv (an open-access preprint server) in April 2023 and later peer-reviewed in Communications of the ACM. Its release drew a UCR press write-up and trade coverage in IEEE Spectrum, including the now-viral line attributed to the authors: "GPT-3 needs to 'drink' (i.e., consume) a 500ml bottle of water for roughly 10–50 medium-length responses, depending on when and where it is deployed."
Two things matter about that quote. First, the system it describes is GPT-3, a large language model OpenAI released in 2020; today's frontier systems are bigger, differently served, and run in different facilities. Second, "when and where" is not throat-clearing. The authors' calculations tie the per-query figure to local water-use effectiveness (WUE), or how many litres of on-site water a data centre evaporates per kilowatt-hour of IT load, and that ratio is set by ambient temperature, humidity, cooling architecture, and the water intensity of the regional electricity grid. Microsoft's own datacenter efficiency methodology publishes regional WUE figures that vary widely across its fleet, which is one of the reasons the same prompt can carry a very different water bill in Phoenix and in Quebec.
The headline training number, roughly 700,000 litres of freshwater evaporated on-site during GPT-3's training run, is built the same way. The authors combined Microsoft's public per-region WUE values with the data centre's published compute profile to estimate direct evaporative cooling losses. A second figure, around 5.4 million litres, folds in the water embedded in the electricity GPT-3 consumed during training, and that one is an estimate stacked on top of an estimate, because OpenAI never disclosed where GPT-3 was trained. Ren's follow-up with Amy Luers is explicit about the asymmetry: the on-site number is closer to a model output than a measurement, and the lifecycle figure multiplies that by assumptions about grid water intensity that change year to year. The methodology and inputs are open in the team's GitHub repository, which is part of why the work has travelled as far as it has.
What the figure does not do is tell you what a single ChatGPT session costs in water. The 500 mL per 10–50 replies is described by the authors themselves as conservative and explicitly location- and time-bound; it does not include the embedded water in the model weights themselves, the amortised training water spread across the model's lifetime, or the embodied water in the hardware. It also predates the current generation of frontier systems and the data centres that serve them.
Industry has begun to answer some of these gaps, and not always in a way that simplifies the picture. Microsoft's 2024 Environmental Sustainability Report and a December 2024 Cloud blog post describe next-generation facilities that cool with no on-site water evaporation. That is a forward-looking design claim about a subset of Microsoft's fleet, not a description of the hardware that trained GPT-3, and the company's own efficiency page makes clear that older sites still depend on evaporative cooling in hot climates. Amazon, after years of disclosing mostly electricity use, began publishing annual data centre water figures, a shift Latitude Media framed as the end of a transparency "black box", at least for one operator. Google has been incrementally opening up its own water reporting as well.
The corrective the paper's authors are asking for is modest and useful. AI's water footprint is real and worth measuring; the most-cited number in the field is a snapshot from one 2020-era model, derived from public efficiency data and assumptions about location, and it should travel with the clause that makes it intelligible. The next time the half-litre figure shows up in a headline, the useful question is not whether AI is thirsty but where it is running, what hour, on what grid, and how that grid's water intensity is trending this year.