Recently, AOL released about 20 million search queries of over 650,000 users to researchers. As the Washington Post reported:
AOL released the Internet search terms that more than 650,000 of its subscribers entered over a three-month period and admitted Monday that what it originally intended as a gesture to researchers amounted to a privacy breach and a mistake.
Although AOL had substituted numeric IDs for the subscribers’ real user names, the company acknowledged the search queries themselves may contain personally identifiable data.
The disclosure was done without AOL users’ consent. AOL thought that it wasn’t threatening anybody’s privacy because it stripped identifying information from the searches. But as the New York Times demonstrated, individuals can be identified based on their search queries:
Buried in a list of 20 million Web search queries collected by AOL and recently released on the Internet is user No. 4417749. The number was assigned by the company to protect the searcher’s anonymity, but it is not much of a shield.
No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.”
And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”
It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. “Those are my searches,” she said, after a reporter read part of the list to her.
These recent stories bring back memories of the recent attempt by the government to subpoena search queries from Google and other Internet search engines. I blogged about the case extensively here. The government claimed that privacy wasn’t threatened because it wasn’t seeking the identities of the individuals doing the searches.
In recent remarks, Google’s chief executive noted that government access to search query data remained an issue of major concern for the company. Reuters reports:
Web search leader Google Inc., which stores vast amounts of data on the Web surfing habits of its users, sees government intrusions rather than accidental public disclosures of data as the greatest threat to online privacy, its chief executive said on Wednesday.
CEO Eric Schmidt told the Search Engine Strategies industry conference here that Google had put all necessary safeguards in place to protect its users’ personal data from theft or accidental release. . . .
But Schmidt said a more serious threat to user privacy lay in potential demands on Google by governments to make the company give up data on its customer’s surfing habits.
“You can never say never,” Schmidt said during an onstage interview with Web search industry analyst Danny Sullivan.
I’m quite pleased that Google appears to have recognized the significant threat that government access to search query data can pose. But Internet search engine companies such as Google (and other companies that gather personal data) must do more. The law in the United States provides very meager protections against government access to personal data when it is in the hands of businesses. In an earlier blog post, I urged businesses to push for legislation to better regulate when the government can obtain data from them.
Hopefully, recent events demonstrate that (1) the argument that anonymized data raises no privacy concerns must be viewed with skepticism because even anonymized personal data can be identifiable, and (2) business that gather and store personal data expose the data to the potential for government access, and it is in their interest to fight for better legal protections against government access so that they can assure their customers that their data is truly private.
1. Solove, The Google Subpoena Case: A Google Victory (March 2006)
3. Solove, Government vs. Google (January 2006)
4. Solove, Google’s Empire, Privacy, and Government Access to Data (November 2005)
Originally Posted at Concurring Opinions
* * * *
This post was authored by Professor Daniel J. Solove, who through TeachPrivacy develops computer-based privacy training, data security training, HIPAA training, and many other forms of awareness training on privacy and security topics. Professor Solove also posts at his blog at LinkedIn. His blog has more than 1 million followers.