Copyright Office Weighs in on AI Training Infringement
by
May 29, 2025
The U.S. Copyright Office has published its third Report on Copyright and Artificial Intelligence in the form of a “pre-publication” draft (we wrote about the second report here). Part three focuses on the legal analysis used to determine copyright infringement in AI-related cases. A recent Sidley memo highlights key takeaways, including how an AI model’s “Weights” impact infringement analysis:
“According to the Report, if the model can generate an identical or nearly identical copy of the underlying work without that expression being provided in the form of a prompt or input, there is a strong argument that the model’s “weights” — numerical parameters that determine the importance of dataset features — could implicate the right of reproduction. Model weights that have memorized protectable expression from training data may also infringe the derivative work right.”
Copyright infringement can occur both in building an AI training dataset and when AI outputs too closely reflect their training data. While AI developers are on the hook for infringement when building the dataset, end users could potentially face liability for AI outputs. As the memo goes on to say:
“The Report notes that whether a model’s weights implicate the reproduction or derivative work rights turns on whether the model has retained or memorized ‘substantial protectable expression’ from the underlying works. In such an instance, distributing, fine-tuning, or deploying a model could expose developers and downstream users to liability for infringement.”
As we’ve noted previously, AI litigation is on the rise. If copyright infringement cases against AI providers prove successful, the plaintiffs’ bar may start looking for more defendants with deep pockets. This is why it is critical for companies to know where their AI model data comes from, whether the model is developed in-house or is licensed through a third party.