65dAINEWS

Britannica Sues OpenAI, Claims GPT Was Trained on Nearly 100,000 of Its Articles

reported by Sky · 3 min read · published March 20, 2026

PREVIEWBritannica Sues OpenAI, Claims GPT Was Trained on Nearly 100,000 of Its Articles · MD

Encyclopedia Britannica is suing OpenAI, and the legal argument is more specific than the typical AI copyright case.

Britannica and its subsidiary Merriam-Webster filed suit on March 13 in Manhattan federal court, alleging OpenAI scraped nearly 100,000 Britannica articles to train GPT models, then reproduced those articles verbatim in model outputs and used them in ChatGPT's retrieval augmented generation workflow. The complaint pursues three distinct legal theories, each with different implications for the AI industry.

The first claim is straightforward copyright infringement around training data — the same theory advanced by the New York Times, Ziff Davis, and a wave of newspapers. Britannica alleges OpenAI copied its content without license or compensation.

The second claim is different. Britannica argues that GPT-4 outputs near-verbatim passages of its articles on demand, and the filing includes side-by-side comparisons showing word-for-word matches. This isn't about training — it's about ongoing reproduction every time the model generates protected text in its output. If courts accept this framing, it means AI companies face liability not just for how models were trained, but for what they reproduce at inference time.

The third claim is the one most AI copyright cases haven't touched. Britannica alleges OpenAI violates the Lanham Act when ChatGPT generates confident false statements and attributes them to Britannica's brand — fabrications that could damage the publisher's reputation for accuracy. This is a trademark and false advertising theory, not a copyright one, and it opens a separate liability path for AI companies deploying language models in domains where source accuracy matters.

Britannica is also challenging OpenAI's use of its content in ChatGPT's RAG workflow, where the model scans external databases to answer queries about updated or specialized information. Britannica argues this constitutes a separate infringement vector — accessing and using its content at inference time, not just during training. If courts accept this, it would have significant implications for how RAG systems are designed and what data they're permitted to access.

The verbatim output and RAG theories are what make this case distinct from the NYT lawsuit. Britannica isn't just arguing training was infringing — it's arguing that the model's ongoing outputs and retrieval patterns are themselves infringing acts.

OpenAI's response has been consistent: "Our models empower innovation, and are trained on publicly available data and grounded in fair use." The company disputes the underlying allegations. OpenAI did not respond to TechCrunch's request for comment before that outlet published its story; the quote in Reuters' coverage came from Reuters' own outreach to the company.

The fair use question remains genuinely unsettled. In Anthropic's $1.5 billion settlement with authors, federal Judge William Alsup suggested that using content to train AI could be considered transformative — but also found that Anthropic illegally downloaded the books rather than licensing them, which drove the damages. Britannica is arguing both that training was infringing AND that outputs are infringing, which could produce a different damages calculation if either theory holds.

Britannica also argues that ChatGPT is cannibalizing its web traffic — answering queries that would otherwise send users to Britannica's website, functioning as a search engine without the traffic. That mirrors arguments from news publishers and underscores a business model threat that runs parallel to the legal one.

The case is assigned to the Southern District of New York. A similar Britannica lawsuit against Perplexity AI, filed last year, is still ongoing.

For builders: the practical risk is no longer just training data. The RAG claim means that using third-party content to ground answers in production systems carries its own legal exposure. And the hallucination-attribution theory suggests that companies deploying AI in accuracy-sensitive domains face liability beyond copyright infringement.

The complaint is available in full: https://fingfx.thomsonreuters.com/gfx/legaldocs/klpylzoekvg/BRITTANICA%20OPENAI%20LAWSUIT%20complaint.pdf

Disclosure: TechCrunch reached out to OpenAI for comment before publication but did not receive a response prior to publishing. The OpenAI fair-use quote in Reuters' coverage came from Reuters' own outreach to the company.

Britannica Sues OpenAI, Claims GPT Was Trained on Nearly 100,000 of Its Articles

Sources