1hAINEWS

Anthropic Must Destroy Pirated Books, Certify Training Data in Court

reported by Sky · 3 min read · published May 15, 2026

PREVIEWAnthropic Must Destroy Pirated Books, Certify Training Data in Court · MD

Under the terms of Anthropic's $1.5 billion copyright settlement, the company has agreed to destroy the original files of pirated books it downloaded from Library Genesis and Pirate Library Mirror within 30 days of final judgment, and to certify to the court that those datasets were used in training any of its commercially released large language models. According to Courthouse News Service's reporting on the court filing, the settlement terms state that Anthropic must represent "that Library Genesis and Pirate Library Mirror datasets with pirated material were used in training any of the company's commercially released large language models," under penalty of perjury. This is not a press release. It is a representation to a federal court. And it is the first time an AI company has agreed in a binding legal document to identify exactly where its training data came from.

The dollar figure is real and it is historic. The settlement resolves Bartz v. Anthropic, filed in the Northern District of California, where the parties were headed to trial in December 2025 before reaching an agreement in late August. Former Senior U.S. District Judge William Alsup granted preliminary approval in September. It is the largest copyright class action settlement in United States history.

The certification has not yet been filed. Whether it lands as a narrow disclosure covering broad dataset categories or a detailed accounting of which training runs touched which sources is the unresolved question that will determine how much this settlement actually changes. The destruction order's 30-day clock starts once Judge Martínez-Olguín grants final approval. The settlement does not specify an independent audit mechanism for verifying that the files have been destroyed. The filing does not identify which specific Claude model versions are covered.

A fairness hearing on final approval took place May 14 in San Francisco before Judge Araceli Martínez-Olguín, who took the motion under submission. Anthropic has already paid $300 million into the settlement fund. The remaining schedule is structured: $300 million due within five days of final approval, then $450 million on the first anniversary of preliminary approval and $450 million on the second. The full payout per work is estimated at $3,100. Roughly 482,000 works are included in the settlement; class members representing approximately 448,000 of those works have filed claims, a 93% participation rate that plaintiffs' attorney Justin Nelson of Susman Godfrey called a signal of the deal's strength. Three hundred and fifty class members opted out. Fifty-three objected at the hearing.

The objectors raised issues the court has not resolved. One told Judge Martínez-Olguín that the settlement's definition of an eligible work undercounts the actual damage: a single group copyright registration can cover dozens of independently published novels, all downloaded by Anthropic without permission, but counted as one claimable work. Another raised the exclusion of pseudonymous and self-published authors from the class. A third argued that a one-time payment cannot account for the ongoing commercial value of their work inside a product Anthropic continues to sell. An attorney for a group of objectors asked the court to reopen the opt-out period, citing recently uploaded case documents on the settlement website. The plaintiffs' attorneys' fee request, totaling roughly $300 million, or 20% of the settlement, plus expenses and reserve funds, is also pending.

The destruction order and certification requirement are what make this settlement technically unusual. They do not merely resolve a financial dispute. They create a documented obligation for AI training provenance that the industry has never before agreed to in open court. Whether that documentation ends up as a narrow certification covering broad dataset categories or a detailed inventory of which model versions were trained on which sources is the detail that engineers, VCs, and regulators will be watching for once Judge Martínez-Olguín rules. How she balances enforceability against the practical limits of what a court can verify about a training run will determine whether this settlement creates a template or remains an anomaly.

The number will be what most outlets lead with. It should not be.

Anthropic Must Destroy Pirated Books, Certify Training Data in Court

Sources