A 2.5-Year Paper, Withdrawn Over Hallucinated References
A machine learning first author says a coauthor's last minute citations, generated by a large language model, killed a submission that had earned accept level scores from every other reviewer.
A machine learning first author says a coauthor's last minute citations, generated by a large language model, killed a submission that had earned accept level scores from every other reviewer.
A machine learning researcher on Reddit spent two and a half years as the lead on a paper, only to watch it die in peer review over citations the coauthor had pasted in at the deadline. The references looked plausible, complete with author names, journal titles, and DOIs, and the coauthor assured the lead they were real. A reviewer eventually checked and found that every one of them was a fabrication generated by a large language model (LLM), a system such as ChatGPT that produces text on demand. The paper was withdrawn, even though the rest of the reviews were strong.
The post, from user treeman0469 on the r/MachineLearning subreddit, is the kind of specific, anonymous grievance that is hard to verify and easy to dismiss, but it captures a failure mode that the published evidence shows is becoming routine ([Unprofessional Coauthor Behavior with Hallucinated References [D]](https://www.reddit.com/r/MachineLearning/comments/1u4m3lz/unprofessional_coauthor_behavior_with/)). The poster says they carried more than 90% of the work and submitted without independently verifying the coauthor's added references. The reviewer who caught the problem treated the discrepancy as a fatal flaw.
The scale of the underlying problem is now a matter of record, not anecdote. An audit of roughly 2.5 million papers in PubMed Central's open-access subset, published in Nature in May, identified nearly 3,000 papers containing references that could not be traced to any real publication, and showed fabricated-citation rates climbing sharply after 2023 (Nature, 8 May 2026). A separate Retraction Watch analysis of the same audit — drawing on a Lancet letter — has estimated the rise at roughly twelvefold over two years, with approximately 1 in 277 papers published in early 2026 referencing work that does not exist (Retraction Watch Weekend Reads, 23 May 2026). A separate Nature news feature in April estimated that tens of thousands of 2025 publications may include invalid AI-generated references (Nature, 1 April 2026).
The repositories are now formalizing a response. arXiv announced in May that researchers whose submissions contain hallucinated references or other incontrovertible signs of unverified generative-AI output will be banned from posting for one year (Nature, 19 May 2026). The policy is recent and, as Nature noted, not universally endorsed, with some researchers arguing that automated flagging will catch too many false positives. But it marks the first time a major preprint server has set a fixed, named penalty for the practice.
The detective work that surfaces these papers is largely volunteer-driven. Guillaume Cabanac, a computer scientist at the University of Toulouse, has become a leading figure in identifying references that resolve to nothing, maintaining a running "problematic paper" list on PubPeer that has caught the attention of mainstream journals. Retraction Watch, a publication that has tracked retractions for two decades, now maintains parallel running lists of papers with evidence of ChatGPT-style writing and image-integrity red flags (Retraction Watch running list), and the Retraction Watch database now lists more than 65,000 retractions — a total that includes multi-paper clusters at individual institutions driven by a range of concerns, from plagiarism and data integrity to AI-related concerns.
The harder question is not detection. It is the division of labor inside a research team. Dorothea Baur, an AI ethics consultant, has catalogued five recurring defenses offered for hallucinated references, from defensive deflection ("the model put it there, not me") to resignation ("everyone is doing it"), and has argued that the right response is a desk reject plus a one-year submission ban for the responsible authors (Hallucinated References: Five Excuses for Academic Misconduct). The PNAS essay "Creating a responsible authorship culture in science" frames the same problem from the other end, asking institutions to anchor authorship in transparency, credit, and accountability rather than in who got their name on the byline (PNAS, doi 10.1073/pnas.2531268123).
What the Reddit post actually exposes, more than any one coauthor's lapse, is a workflow gap. In the pre-LLM era, a coauthor adding a reference was a small, low-risk act: a paper either existed or it did not, and the author who added it could be expected to know. Today, an LLM will produce a fully formed, internally consistent citation that no one on the team can recognize as fake by inspection alone. The verification step has become non-optional, and the cost of skipping it has been externalized to the first author and the journal.
The practical fix is small and unglamorous. First authors should require that every new citation in a manuscript be checked against the DOI and the primary source by the person who added it, not by the lead at submission time. Principal investigators should treat citation provenance as a deliverable on par with figures and code, and make it visible in the workflow. Reviewers and program committees who encounter a reference they cannot quickly resolve — where quickly means well under the time it would take to check a DOI — should flag it rather than assume the submitting team verified it. The treeman0469 post is anonymous and unverified in its specifics, but the mechanism it describes, a coauthor adding LLM-fabricated references at the deadline, is exactly the failure mode the published evidence says is now common. Treating it as a one-off grievance misses the lesson. It is a workflow problem, and the workflow is fixable.