The Great Scrape: The Clash Between Scraping and Privacy

The Great Scrape - The Clash Between Scraping and Privacy

I’m posting a new article draft with Professor Woodrow Hartzog (BU Law), The Great Scrape: The Clash Between Scraping and Privacy. We argue that “scraping” – the automated extraction of large amounts of data from the internet – is in fundamental tension with privacy. Scraping is generally anathema to the core principles of privacy that form the backbone of most privacy laws, frameworks, and codes.

You can download the article for free on SSRN.

Here’s the abstract:

Artificial intelligence (AI) systems depend on massive quantities of data, often gathered by “scraping” – the automated extraction of large amounts of data from the internet. A great deal of scraped data is about people. This personal data provides the grist for AI tools such as facial recognition, deep fakes, and generative AI. Although scraping enables web searching, archival, and meaningful scientific research, scraping for AI can also be objectionable or even harmful to individuals and society.

Organizations are scraping at an escalating pace and scale, even though many privacy laws are seemingly incongruous with the practice. In this Article, we contend that scraping must undergo a serious reckoning with privacy law. Scraping violates nearly all of the key principles in privacy laws, including fairness; individual rights and control; transparency; consent; purpose specification and secondary use restrictions; data minimization; onward transfer; and data security. With scraping, data protection laws built around these requirements are ignored.

Scraping has evaded a reckoning with privacy law largely because scrapers act as if all publicly available data were free for the taking. But the public availability of scraped data shouldn’t give scrapers a free pass. Privacy law regularly protects publicly available data, and privacy principles are implicated even when personal data is accessible to others.

This Article explores the fundamental tension between scraping and privacy law. With the zealous pursuit and astronomical growth of AI, we are in the midst of what we call the “great scrape.” There must now be a great reconciliation.

Click the button to download the essay draft for free.

* * * *

Professor Daniel J. Solove is a law professor at George Washington University Law School. Through his company, TeachPrivacy, he has created the largest library of computer-based privacy and data security training, with more than 150 courses. He is also the co-organizer of the Privacy + Security Forum events for privacy professionals.

PRIVACY + SECURITY BLOG

News, Developments, and Insights

The Great Scrape: The Clash Between Scraping and Privacy

PRIVACY + SECURITY BLOG

News, Developments, and Insights

Subscribe to Professor Solove’s Free Newsletter