Elias the Lighthouse Keeper Is a Fingerprint of How Chatbots Get Aligned
A Cornell preprint's analysis of 20,000 AI generated stories from three commercial models points to a shared layer of the training pipeline as the source of the repetition.
A Cornell preprint's analysis of 20,000 AI generated stories from three commercial models points to a shared layer of the training pipeline as the source of the repetition.
When software engineer Daniel May asked a chatbot to tell him a story, he expected something invented. He got Elias Thorne, a lighthouse keeper.
According to a Cornell University preprint first reported by 404 Media, Elias shows up across a startling share of AI-generated fiction. The paper's authors did not set out to study lighthouse keepers. They set out to measure repetition. What they found is something more pointed than a quirky pattern: a measurable, reproducible trace of how safety training narrows what these systems are allowed to imagine.
The researchers ran five different prompts through three commercially available small models—OpenAI's GPT-5.4 Mini, Anthropic's Claude Haiku 4.5, and Google's Gemini 3.1 Flash-Lite—and collected roughly 20,000 generated stories. Across all three, the same eleven words and concepts kept surfacing: Lighthouse, Keeper, Baker, Mayor, Clockmaker, Fisherman, Librarian, Conductor, and the names Mara, Elias, and Elara. That combination of words appeared in 88 percent of all stories. No narrow-noun combination appeared more often than Elias the lighthouse keeper, which showed up in two-thirds of all stories generated.
The preprint's authors ruled out pre-training data as the source: "Elias the lighthouse keeper" does not appear with excess frequency in the pre-training literature itself. Instead, they attribute the issue to the alignment and conversation-tuning pipeline that all three vendors pass their models through before release. They cited WildChat, a publicly available corpus of millions of real user and chatbot exchanges that is widely reused in alignment work, as a candidate origin. If a single community-built dataset is shaping the creative defaults of three competing models, that is a structural fact about how the current generation of chatbots is built.
The authors theorize that alignment training—meant to steer models away from copyrighted characters and adult content—may have inadvertently compressed the creative distribution onto a narrow pool of "safe" nouns, character archetypes, and names. The result is a model that can write a sonnet but cannot easily imagine a lighthouse keeper with a name other than Elias.
The preprint has not been peer-reviewed. The "guardrails cause repetition" claim is a hypothesis from a single research group, not industry consensus. But what makes this story worth following is not that chatbots have a favorite fictional character. It is that the favorite character is the same across vendors, and that the explanation being offered is structural rather than incidental. Elias is a fingerprint. The interesting question is which datasets it points at, and what a more diverse, better-documented alignment corpus would change about the recurring cast.