News, Developments, and Insights

high-tech technology background with eyes on computer display

Google Subpoena and Privacy Case

On Friday, Judge James Ware, a U.S. District Judge in San Jose, CA, issued a decision in Gonzales v. Google, Inc., No. CV 06-8006MISC JW (Mar. 17, 2006), the case involving a government subpoena for Google search queries. A few days before Judge Ware released his opinion, he stated that he would be ordering Google to turn over some information, though not everything that the government was demanding. Media reports indicated a victory for the government, as these headlines suggest: “Judge Siding With Feds Over Google Porn Subpoena” (AP) and “Google Faces Order to Give Up Records” (Boston Globe).

But Judge Ware’s written decision strikes me as much more of a victory for Google and privacy than for the government.

The subpoena was issued because the government wanted information for use in ACLU v. Gonzales, No. 98-CV-5591, pending in the Eastern District of Pennsylvania. That case involves a challenge by the ACLU to the Child Online Protection Act (COPA), 47 U.S.C. § 231. Google wasn’t even a party to that case, but the government suboenaed from Google (1) URL samples: “[a]ll URL’s that are available to be located to a query on your comapny’s search engine as of July 31, 2005” and (2) search queries: “[a]ll queries that have been entered on your company’s search engine between June 1, 2005 and July 31, 2005 inclusive.” Subsequently, the goverment narrowed its URL sample demand to 50,000 URLs and it narrowed its search query demand to all queries during a 1-week period rather than the two-month period mentioned above. Google still raised a challenge, and the government again narrowed its search query request for only 5000 entries from Google’s query log.

Under Federal Rule of Civil Procedure 26, a subpoena may be quashed if the “burden or expense of the proposed discovery outweighs its likely benefit.” The court (Judge Ware) began by analyzing the government’s request for a URL sample, pointing out the paucity of the government’s explanation for its need for the information. The court observed:

The Government’s disclosure of its plans for the sample of URLs is incomplete. The actual methodology disclosed in the Government’s papers as to the search index sample is, in its entirety, as follows: “A human being will browse a random sample of 5,000-10,000 URLs from Google’s index and categorize those sites by content” and from this information, the Goverment intends to “estimate . . . the aggregate properties of the websites that search engines have indexed.” The Government’s disclosure only describes its methodology for a study to categorize the URLs in Google’s search index, and does not disclose a study regarding the effectiveness of filtering software. Absent any explanation of how the “aggregate properties” of material on the Internet is germane to the underlying litigation, the Government’s disclosure as to its planned categorization study is not particularly helpful in determining whether the sample of Google’s search index sought is reasonably calculated to lead ot admissible evidence in the underlying litigation.

One would think, after reading this paragraph, that the government has failed to establish a justification for the URLs. Nevertheless, the court attempted to “imagine[]” and “envision” a possible use for the information the government is seeking. The court then concluded that it would “give[] the Government the benefit of the doubt.”

This was the partial victory that the government won, and it wasn’t a very big victory. The second half of the opinion was all Google. This latter part of the opinion dealt with the government’s demand for search queries — the part of its demand that implicated privacy. The court rejected the government’s request for the search queries — even after the government had repeatedly backed away from its initial demands. The government had begun by demanding two months worth of search queries (constituting millions of queries); it then backed down and demanded queries for just a one week period (a substantial number of queries); and it recently had further retreated to asking for just 5000 queries. This was a dramatic retreat, but the court still sent the government packing.

According to the government, it planned to use the search queries as follows: “A random sample of approximately 1,000 Google queries from a one-week period will be run through the Google search engine. A human being will browse the top URLs returned by each search and categorize the sites by content.” The court, without much analysis, concluded that “were the Government to run these URLs through the filtering software and analyze the results, the information sought would be reasonably calculated to lead to admissible evidence.” Although the court ultimately denied the government’s demand, I wonder whether the court should have so quickly conceded the government’s need for the information. Why couldn’t the government just create its own search queries and run them through Google’s search engine? Why did it need a sampling of people’s searches? It could certainly conduct a study of how various searches work with filtering software by using its own queries. Moreover, the fact that the government had begun with a wildly broad request and narrowed it significantly should at least spark some skepticism about whether the government was engaging on a fishing expedition.

The court then turned to the considerations on the other side — the costs and burdens of Google’s production of the information. Google argued that it would lose user trust if compelled to reveal the searches. The court began by using Google’s privacy policy against it, stating: “Google’s privacy policy does not represent to users that it keeps confidential any information other than ‘personal information.’” The court then noted:

However, even if an expectation by Google users that Google would prevent disclosure to the Government of its users’ search queries is not entirely reasonable, the statistic cited by Dr. Stark that over a quarter of all Internet searches are for pornography indicates that at least some of Google’s users expect some sort of privacy in their searches. The expectation of privacy by some Google users may not be reasonable, but may nonetheless have an appreciable impact on the way in which Google is perceived, and consequently the frequency with which users use Google.

The court concluded that the goverment did not need both the URL samples and the search queries, and it required only the disclosure of the URL samples but not the search queries. The court concluded that “the marginal burden of loss of trust by Google’s users based on Google’s disclosure of its users’ search queries to the Government outweighs the duplicative disclosure’s likely benefit to the Government’s study.”

Beyond Google’s argument about customer goodwill, the court also raised general privacy concerns as a public policy interest implicated by the subpoenas. In Rule 26, “considerations of the public interest, the need for confidentiality, and privacy interests are relevant factors to be balanced.” Gill v. Gulfstream Park Racing Association, 399 F.3d 391, 402 (1st Cir. 2005). The government argued that it was only requiring the text of the search queries entered, not the identies of the users who entered them, and that therefore there was no privacy interest. But the court concluded:

Although the Government has only requested the text strings entered, basic identifiable information may be found in the text strings when users search for personal information such as their social security numbers or credit card numbres through Google in order to determine whteher such information is available on the Internet. The Court is also aware of so-called “vanity searches,” where a user queries his or her own name perhaps with other information. . . . Thus, while a user’s serach query reading “[user name] stanford glee club may not raise serious privacy concerns, a user’s search for “[user name] thrid trimester abortion san jose,” may raise certain privacy issues as of yet unaddressed by the parties papers. This concern, combined with the prevalence of Internet serches for sexually explicit material — generally not information that anyone wishes to reveal publicly — gives this Court pause as to whether the search queries themselves may constitute potentially sensitive information.

Moreover, there is the problem of what I’ll call the subpoena two-step. Step One is using a subpoena to get a bunch of de-identified search queries. Then, if the government discovers search queries it deems “suspicious,” it can use a subpoena (Step Two) to get any identifying information (e.g., an IP address, which can be linked to a user’s identity via ISP records, and even sometimes a user’s name if a user has an account with Google). I blogged about this possibility in an earlier post on this case. The court also appeared to recognize this problem:

Even though counsel for the Government has assured the Court that the information received will only be used for the present litigation, it is conceivable that the Government may have an obligation to pursue information received for unrelated litigation purposes under certain circumstances regardless of the restrictiveness of a protective order. The Court expressed this concern at oral argument as to queries such as “bomb placement white house,” but queries such as “communist berkeley parade route protest war” may also raise similar concern. In the end, the Court need not express an opinion on this issue because the Government’s motion is granted only as to the sample of URLs and not as to the log of search queries.

The court explicitly did not address the ECPA argument made by Google.

Overall, I view this opinion as a victory for information privacy. The government was not entitled to obtain information about people’s search queries — even the much more narrow request of a sample of 5000 of them.

UPDATE: Philipp Lenssen at Google Blogoscoped points out: “In retrospect, this decision also shows that MSN, Yahoo and others gave away their search logs even when they didn’t have to – even when they could’ve successfully and legally opposed to it.”

Related Posts:

1. Solove, Do No Evil and Perhaps Do Some Good: Google, Privacy, and Business Records

2. Solove, Government vs. Google

3. Solove, Google’s Empire, Privacy, and Government Access to Data

Originally Posted at Concurring Opinions

* * * *

This post was authored by Professor Daniel J. Solove, who through TeachPrivacy develops computer-based privacy training, data security training, HIPAA training, and many other forms of awareness training on privacy and security topics. Professor Solove also posts at his blog at LinkedIn. His blog has more than 1 million followers.

Professor Solove is the organizer, along with Paul Schwartz, of the Privacy + Security Forum and International Privacy + Security Forum, annual events designed for seasoned professionals.

If you are interested in privacy and data security issues, there are many great ways Professor Solove can help you stay informed:
LinkedIn Influencer blog

TeachPrivacy Ad Privacy Training Security Training 01