Data Scraping: Overview of Intellectual Property Issues

by John Jenkins

March 11, 2025

Data scraping involves collecting AI training data from internet websites, and the practice raises a variety of legal issues, not the least of which is compliance with intellectual property laws. For example, scraping copyrighted materials for use in training AI models may involve copyright infringement, and litigation over this issue is increasing globally. A recent OECD report discusses the intellectual property issues associated with data scraping and suggests potential policy approaches. Here’s the abstract:

Recent technological advances in artificial intelligence (AI), especially the rise of generative AI, have raised questions regarding the intellectual property (IP) landscape. As the demand for AI training data surges, certain data collection methods give rise to concerns about the protection of IP and other rights. This report provides an overview of key issues at the intersection of AI and some IP rights. It aims to facilitate a greater understanding of data scraping — a primary method for obtaining AI training data needed to develop many large language models. It analyses data scraping techniques, identifies key stakeholders, and worldwide legal and regulatory responses. Finally, it offers preliminary considerations and potential policy approaches to help guide policymakers in navigating these issues, ensuring that AI’s innovative potential is unleashed while protecting IP and other rights.

Recommendations include the adoption of a voluntary data scraping code of conduct, the implementation of standard technical tools to help protect IP rights and enable rights holders to better manage access to their data, the use of standard contract terms to address legal and operational issues associated with data scraping, and efforts to raise awareness of data scraping and its legal implications.