Ask Claude, ChatGPT, or Gemini for a random integer between 1 and 10 and you'll almost certainly get back the same answer: 7. The response is supposed to be stochastic, a coin flip the model performs billions of times a second, but the coin keeps landing on the same face. A new academic preprint and at least one well-funded startup argue that the pattern is not a glitch. It is a structural property of how modern language models are trained to talk.
That empirical observation — that '7' dominates across major chatbots — is documented across multiple MIT Technology Review pieces spanning June and July 2026, and formalized in a June arXiv preprint (\"Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)\"), which treats the convergence as a measurable phenomenon rather than anecdote.
That matters because the same property shows up everywhere else. According to a July roundup in MIT Technology Review's \"The Download\", the homogeneity is easiest to spot on open-ended prompts: ask several chatbots the same creative question and they often return near-identical ideas, voices, and structures. For coding, math, and retrieval, that convergence is often a feature, a "modal answer" being a useful default. For brainstorming, planning, or drafting, it is a constraint that most users do not realize they are working under.
Researchers have started giving it a name. A paper posted to arXiv in late June, \"Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)\", treats the convergence as a measurable phenomenon rather than an anecdote. The mechanism, in plain terms, is statistical. Large language models are trained on text written by humans, and human-written text clusters around shared conventions, idioms, and opinions. A model trained to predict the next token learns to favor the most likely next word, not the most original one. When the prompt leaves room for variation, "most likely" still tends to mean "modal," which in practice means the answer most other writers and models have already produced. The more thoroughly a training corpus has converged on consensus, the harder it is for any model learning from that corpus to escape the gravity well.
That training-time pull is what an Australian startup called Springboards wants to push back against. The company has spent the past two years building an LLM named Flint that is explicitly optimized for response variety, not just prediction accuracy, according to MIT Technology Review's feature on the launch. Springboards raised a $5 million seed round and lined up more than 120 agencies as design partners, the kinds of teams that pay for brainstorming tools and notice immediately when those tools start repeating themselves (company announcement on the seed round).
The company's own framing is blunt: most AI assistants have a "habit of predictable, boring answers" baked into how they are trained, and Flint is positioned as the antidote (Springboards blog post on the Flint launch). The technical story is consistent with the academic one. A model rewarded for producing the modal answer produces that answer again and again. A model rewarded for producing a different one, when different is what the prompt calls for, has at least a chance of doing so.
The honest caveats matter. Flint is a single startup's response to a problem the academic literature has only just started to measure, and the arXiv paper is a preprint, not a peer-reviewed consensus. The diversity metrics it relies on are still being stress-tested by other groups, and adoption claims from a company-controlled blog need independent corroboration before they count as evidence of demand. "More varied" is also not the same as "more accurate" or "more truthful." A model trained to escape the modal answer can be wrong in new ways, including confidently wrong in directions that happen to feel fresh. Convergence is a feature for tasks where one answer is genuinely the right one, and a bug for tasks where the point is to surface options that have not yet been surfaced at all.
MIT Technology Review flagged the structural concern earlier in the summer in a feature on what its authors called a "bottleneck holding back LLMs", treating output convergence as a live research thread rather than a marketing slogan. Whether Springboards' bet pays off is one data point. Whether the broader lab ecosystem starts treating response variety as a first-class evaluation metric, rather than a side effect of next-token prediction, is the larger watch item.