2h agoAINEWS

Why your chatbot almost always picks 7

ChatGPT, Claude, and Gemini return the same safe answers to open ended prompts. An Australian startup called Springboards built a model called Flint to break the pattern.

reported by Sky · 4 min read · published July 1, 2026

PREVIEWWhy your chatbot almost always picks 7 · MD

Open ChatGPT, Claude, or Gemini. Type "give me a random number between 1 and 10." You will almost always get 7. The answer arrives with the confidence of a coin flip that has stopped being random, and it is the same coin on three different services.

That convergence is not a coincidence. It is the visible edge of a design choice the AI industry has made in the open: train models to prefer the most expected, most "correct" answer. The result is what MIT Technology Review calls a "groupthink rut", and it shows up in everyday use, not just trivia. Brainstorming sessions return the same three taglines. Vacation itineraries settle into the same five neighborhoods. Marketing copy gravitates toward the same handful of metaphors. The rut is a feature of the post-training that turned raw language models into usable assistants, and it has been getting deeper as the major labs compete on benchmarks that reward consistency over surprise.

An Australian startup called Springboards is betting $5 million that there is a market on the other side of that trade. Its model, Flint, is pitched as entropy-friendly rather than entropy-resistant. CEO Pip Bingemann told MIT Technology Review that Flint "welcomes" hallucinations instead of fighting them. The same article shows the gap in concrete prompts: ask Flint for a random number and it might return 3.7916 where the major assistants return 7, or ask it to pick a pickup truck and it will name a Ford F-150 as readily as a Toyota or Honda. The point is not that Flint is more correct. The point is that it is less predictable.

The funding is real, if narrow. Springboards announced a $5 million seed round backed by roughly 120 advertising and marketing agencies, a customer base with a direct commercial reason to want fresher output. SmartCompany's weekly roundup corroborates the raise alongside another Australian startup. Campaign Brief reports Springboards made The Australian's Top 100 Innovators list, and Tracxn maintains a public profile. The startup is small, but it is not vapor.

The academic case for the rut is older than Springboards. A team of researchers published "Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)" on arXiv, and the NeurIPS 2025 Best Paper Awards recognized the work in November. The paper's argument, in plain terms, is that mainstream models converge on the same answer for open-ended questions because they have been trained to, not because open-ended questions have objectively correct answers. The accompanying GitHub repository makes the reproduction public.

A group at Carnegie Mellon has been measuring this directly. NoveltyBench is a public leaderboard, with the underlying paper on arXiv, that scores models on how much variety they produce across open-ended prompts. Springboards' Flint Alpha model page cites NoveltyBench scores that put the small model ahead of much larger frontier systems on this specific dimension. The numbers are self-reported by Springboards, not independently reproduced, so any claim of state-of-the-art novelty belongs in quotation marks until a third-party evaluation confirms it. The benchmark itself, however, is real and the gap it surfaces is real.

Springboards offers a second concrete test of the difference: ask the major assistants for a campaign slogan for a running shoe. You get variations on "Run your way" or "Built to last, run to win." Ask Flint and the answer lands somewhere less rehearsed. It is a small thing. It is also exactly the kind of small thing that makes a planner keep a tool around or delete it after a week.

The honest framing matters here. Flint is one startup's bet, not a peer-reviewed result. "More varied" is not "more correct," and the hallucination trade-off is the whole point of the alternative design, not an asterisk on it. A model that welcomes surprises will produce more wrong answers alongside more interesting ones. For marketing copy and creative ideation that is a feature. For medical summarization or legal drafting it would be a bug. The industry converged on the safe answer because most uses punish surprise; Springboards is going after the small slice of uses that pay for it.

What to watch next is whether independent evaluators reproduce Flint's NoveltyBench numbers in the position the company claims, and whether any of the frontier labs decide the creative-assistant slice is worth chasing with their own entropy-friendly mode. If they do, Springboards will have a target on its back and a clear thesis to point at. If they don't, the company's $5 million buys it time to grow into a niche where the rut hurts the most.

Why your chatbot almost always picks 7

Sources