• On TV.com: Dollhouse CANCELED, What Went Wrong?
September 11, 2008 7:40 AM PDT

Debunking Google's log anonymization propaganda

by Chris Soghoian
  • Font size
  • Print
  • 15 comments

Google announced on Monday that the company will be reducing the amount of time that it will keep sensitive, identifying log data on its search engine customers. To the naive reader, the announcement seems like a clear win for privacy. However, with a bit of careful analysis, it's possible to see that this is little more than snake oil, designed to look good for the newspapers, without delivering real benefits to end users.

In a post to the company blog on Monday, the company announced that it will be significantly reducing the amount of time that it hangs onto identifying user data in its Web server logs:

Today, we're announcing a new logs retention policy: we'll anonymize IP addresses on our server logs after 9 months. We're significantly shortening our previous 18-month retention policy to address regulatory concerns and to take another step to improve privacy for our users.

Hidden further down in the blog post, were a few more details:

We haven't sorted out all of the implementation details, and we may not be able to use precisely the same methods for anonymizing as we do after 18 months, but we are committed to making it work.

Google's announcement was extremely light on details, specifically, how the company planned to anonymize the records after 9 months. I contacted Google to find out more, and received an extremely interesting reply:

After nine months, we will change some of the bits in the IP address in the logs; after 18 months we remove the last eight bits in the IP address and change the cookie information. We're still developing the precise technical methods and approach to this, but we believe these changes will be a significant addition to protecting user privacy.... It is difficult to guarantee complete anonymization, but we believe these changes will make it very unlikely users could be identified.... We hope to be able to add the 9-month anonymization process to our existing 18-month process by early 2009, or even earlier.

To understand what this means (and how useless the new privacy "enhancements" are), consider the following:

When a user conducts a search using Google's search engine, the company stores three main types of information in a log file: the user's IP address (which is a unique network address given to her computer by her Internet service provider), the words that she searched for, and her cookie identifier (a unique value given to every Web-browser that visits a Google Web-property).

As per Google's existing policy, after 18 months Google "anonymizes" the IP address and cookie information from its logfiles. While the company hasn't said how it de-identifies the cookies, it has revealed in public statements that its IP anonymization technique consists of chopping off the last 8 bits of a user's IP address.

As an example, an IP address of a home user could be 173.192.103.121. After 18 months, Google chops this down to 173.192.103.XXX.

Since each octet (the numbers between each period of an IP) can contain values from 1-255, Google's anonymization technique allows a user, at most, to hide among 254 other computers. In comparison, Microsoft deletes the cookies, the full IP address and any other identifiable user information from its search logs after 18 months.

Google has now revealed that it will change "some" of the bits of the IP address after 9 months, but less than the eight bits that it masks after the full 18 months. Thus, instead of Google's customers being able to hide among 254 other Internet users, perhaps they'll be able to hide among 64, or 127 other possible IP addresses.

By itself, this is a laughable level of anonymity. However, it gets worse.

First, remember that Google will not delete or anonymize user cookies from the logs when it slightly smudges IP addresses after nine months. Second, remember that as long as you use a Google Web property at least once every two years, the company will maintain a unique identifiable cookie value within your Web browser.

Thus, consider the following scenario:

In June 2008, a user from 173.192.103.121 with cookie value 12345 conducts a search for "breast cancer risks." Nine months later, in March 2009, the company scrubs some portion of the IP address, perhaps to 173.192.103.1XX. However, the cookie remains in the log.

In April 2009, that same user returns to Google, and conducts a search for "stephen colbert youtube videos," again from the same IP and the same cookie value 12345.

Even though the 9-month-old search logs have been "anonymized", because the cookie values remain, it is trivial to match the newer search results to the older searches, and thus completely reverse the anonymization process.

The simple truth is that any IP anonymization technique, no matter how strong or weak, is simply a waste of time, if cookie values are not also anonymized.

Unfortunately, Google is relying on the fact that the mainstream media (I'm looking at you New York Times and Washington Post) are clueless on these issues, as well as seemingly most of the technology press. Google's new anonymization policy is totally worthless, and the company deserves to be called out for its deception.


Disclaimer: I interned at Google during the summer of 2006 and received a $5,000 Google fellowship in both 2006 and 2007. I have also interned or worked for both the Electronic Privacy Information Center (EPIC) and the American Civil Liberties Union (ACLU) of Northern California, public interest groups that have been extremely critical of Google's privacy policies.

Christopher Soghoian delves into the areas of security, privacy, technology policy and cyber-law. He is a student fellow at Harvard University's Berkman Center for Internet and Society , and is a PhD candidate at Indiana University's School of Informatics. His academic work and contact information can be found by visiting www.dubfire.net/chris/. He is a member of the CNET Blog Network, and is not an employee of CNET. Disclosure.
Recent posts from Surveillance State
YouTube's new 'nocookie' feature continues to serve cookies
Is the White House changing its YouTube tune?
Recovery.gov blocked search engine tracking
Obama's BlackBerry brings personal safety risks
White House expands use of search-blocking code
Activists call for a mashup-friendly Recovery.gov
White House yanks 'YouTube' from privacy policy
White House acts to limit YouTube cookie tracking
Add a Comment (Log in or register) (15 Comments)
  • prev
  • 1
  • next
by Super2online September 11, 2008 8:08 AM PDT
NICE JOB Chris! I'm very impressed with what you have provided the consumer. Now it's our turn to jump on the bandwagon and demand that Google remove 100% of the IP address AND cookie information at the 9 month time frame or sooner. Well Google- what do you have to say for yourself?
Reply to this comment
by atici September 11, 2008 8:32 AM PDT
I think your conclusions are heavy handed. If one uses google, one should familiarize themselves with their privacy policies. If one finds it unacceptable, there are always alternative search engines. Also one can for instance use anonymity software (such as Tor). By sending packets to google or anyone on the internet you accept to give away the information in your packets.

These policies mentioned do not make google evil. Google does not have any obligation to people other than those stated in their terms of privacy. They give people a free service of course they expect something in return. And if you think you can form a business that not only provides better privacy but also better search results, all the power to you...

What is your recommendation? That the government pass laws to interfere with "evil" entities on the internet? Do you really think that will end up providing better service to the people?

I don't work for google however I am a strong supporter of privacy and freedom over the internet (I donate to eff). But with "privacy" on the internet I understand that the information you did not give away cannot be demanded from you nothing more. And with freedom on the internet, I understand that every entity on the internet is free including google.
Reply to this comment
by CooperWBC September 12, 2008 11:07 AM PDT
I think we all can agree that privacy on the internet is a laughable idea at times, but the point of this article isn't to point out how bad privacy is. The point is to show how google is decieving its users with "Feel Good" procedures.

think if you had to give someone your address, phone number, SSN, CC# or any other peice of private information. they assure you that the information will be shredded after you leave to make sure they cannot retain it and misuse it. Then you come to find out that all they do to 'shred' the paper is to rip it in half once down the middle. It doesn't take a rocket scientist to find the two parts, put them back to geather and reconstruct the information and make the 'shreding' useless.

This is what google is doing, with a smile they are telling you, 'oh your information is completely private after this date', but really behind the scenes it doesn't even matter. while i would like more privacy on the interwebs, i realize that is kind of futile... but i do have a big problem with corporations obfusticating the truth.
by BenjaminWright September 11, 2008 4:46 PM PDT
If Google can assert its legal terms just by publishing them (on something less than its homepage), then maybe users can assert their own terms of privacy protection just by publishing them! A user might say in her published terms of service that search engines cannot keep records of her searches longer than 2 weeks. What do you think? --Ben http://hack-igations.blogspot.com/2008/05/google-privacy-policy-terms-of-service.html My ideas are not legal advice for anyone, just something to discuss.
Reply to this comment
by atici September 11, 2008 6:37 PM PDT
Sure, that makes a lot of sense. You could also go to a grocery store and offer half the money for what you purchased.

What would actually be reasonable is search results that you pay real money for in exchange for no cookie/ip tracking or ad delivery. I believe that would be possible relatively soon.
by dannysullivan September 12, 2008 7:28 AM PDT
They deserve (as do Microsoft and Yahoo, where the same issues apply), to be called out if they're not anonymizing cookies as well. But is that the case? You've said it hasn't identified how it will anonymize cookies -- not that they won't do this at all.

I covered these same issues when the program was first announced last year:
http://searchengineland.com/070314-180307.php

In that, I spent some time addressing the cookie issue as well. And my understanding was that these too would get anonymized.

I agree, I'd like to see a lot more details out there about what's actually happening, how far back they've gone already, etc. And if cookies aren't getting anonymized, yep, that's not helping the overall goal. But you don't know this is the case.
Reply to this comment
by buggermenot September 12, 2008 11:02 AM PDT
Why would I allow Google to set a permanent cookie? Or do you have evidence that they profile my browser so that they can link it to past sessions?
Reply to this comment
by avj24 September 13, 2008 4:16 PM PDT
Good article Chris. Google has always demonstrated this behavior with regards to privacy , the company believes that it and its so called algorithm should be not questioned or asked about it.

Google has always and will continue put its revenue stream ahead of its user's privacy , that is the main reason for their spyware product called Chrome

Google operates under the stated policy of that GOOGLE knows best
Reply to this comment
by clixx13 September 14, 2008 11:44 AM PDT
@buggermenot:
You don't have to consent to it, it's their search engine works.
@atici
In your first paragraph, you highly underestimate the abilities of the average web-surfer. The everyday Googler or the casual yahoo-emailer. Not everyone knows about Tor; most people aren't even aware there's a reason to use applications like it. You also seem to assume that everyone on the internet knows what's being taken from them- a majority of them don't. It's a simple fact. Moving past that, though..
In your second paragraph, I really lost you, though. Why is it crucial to keep unique identifying information of every searcher in order to provide search results? No, Google is not "evil", and no, they do not have any obligation to the user beyond what is stated in their Privacy Policy. However, what is happening here is this: Google is pulling the wool over the eyes of the everyday internet user. It promises a higher level of anonymity, yes, but how high? "High", as in completely anonymous, or "High" as in, I'll be able to stand in a line of 200 people and hope someone doesn't pick me? That's the distinction here. As Chris said, Microsoft completely deletes ALL identifying information in the time frame within which Google is promising that, perhaps, you can stand in a crowd without being noticed.

I dont think Google is evil, but these tactics are scary, coming from such a reputed and supposedly upstanding company. Why do they need this information, why are they seemingly fighting so hard to keep it, and why won't they just come out and SAY what they are doing instead of waiting for someone like Chris to tell us?
Reply to this comment
by mburdett September 14, 2008 4:15 PM PDT
I spoke to Google about their new privacy policy last Wednesday -- see http://www.indybay.org/newsitems/2008/09/10/18534988.php -- and found that they will not be anonymizing logs of users who are logged in to their Google account. Only logs pertaining to unauthenticated users of the search engine are affected by the new policy. Anyone using Google services while logged in to an account could expect their logs to be retained by Google forever and available to third parties via subpoena, search warrant, or other court order.
Reply to this comment
by invenio September 14, 2008 7:16 PM PDT
Two notes from my end:
1) I usually get a new IP address assigned from my ISP every time I re-connect to their servers. However Chris, if I read your article right it shouldn't matter how much of an IP address is retained since the cookie will always identify you (if you conduct a search again within two years - who doesn't).

2) Couldn't we all circumvent Google's cookie retention policy by simply deleting cookies at the end of a session? I have set Firefox to delete this when I close it down.
Reply to this comment
by slackgen September 26, 2008 6:50 PM PDT
I dont think Google is evil, but these tactics are scary . http://www.onlineflashgames.org
Reply to this comment
by slackgen September 26, 2008 7:05 PM PDT
I dont think Google is evil, but these tactics are scar[url=http://www.onlineflashgames.org]y[/url]
Reply to this comment
by abdul0k27 December 29, 2008 11:51 AM PST
These policies mentioned do not make google evil. Google does not have any obligation to people other than those stated in their terms of privacy.
regards,
<a href="http://www.thedayoffdiet.com/">Biggest Loser Fan</a>
Reply to this comment
by marc72 August 4, 2009 2:43 PM PDT
to chris,

The article about Google was enlightening as well a fortifying my fears about them. I read some rather frightening news accounts; "Google retains every keystoke the public maks, bundles them and have sold millions of data lines to the Mexican government as well as Russia and China. Then someone turned me on to a: Google + evil search.

I have recently started using BING exclusively. My concern about Microsoft's policy set me to researching. This is a query I sent to Seachengineland, asking if they were operating different then Google.
******************************************************

The reply: Well, Google doesn?t tell your every keystroke to governments. In fact, they actually fought the US government in a well publicized case to prevent a demand for some data back in 2006 ? and won.

Google also deletes data and was, in fact, the first of the major search engines to agree to a destruction plan like this. My last update on the situation is here:


http://searchengineland.com/anonymizing-googles-server-log-data-hows-it-going-15036


That covers the situation with Microsoft. They destroy data, as well. Neither will destroy it immediately after you leave their site, but in a few months, it?s supposed to go. Unless you log into them, the know virtually nothing about you. An IP address which isn?t that revealing and can change depending on your ISP. A cookie that you can refused if you configure your browser. The services work just as well without them.
****************************************************
My reply back was why any corporation, while providing a search service (and using our existance to satisfy advertisers), would suddenly have dominion over our search event. 9 months to a year and a half is a long time.
And why, after is might search for a beach umbrella, my phone is suddenly inundated with telemarketer calls from suntan lotion, bathing suit, beach front properties and sandbox salesmen! Maybe that's the cost of getting a free search.

Thanks for the enlightenment.

Marcus
Reply to this comment
(15 Comments)
  • prev
  • 1
  • next

S.F. hacker space: Heaven for the DIY set?

The Noisebridge hacker space offers sewing and Mandarin classes, soldering workshops, Internet-controlled front door access, and a server room with no door.
• Photos: Circuits, code, community

The browser battles go on and on

roundup From Firefox to IE and from Chrome to Opera and Safari, there's no sitting still for browser makers looking to keep their products fresh and competitive.

advertisement

About Surveillance State

Christopher Soghoian delves into the areas of security, privacy, technology policy and cyber-law. He is a student fellow at Harvard University's Berkman Center for Internet and Society, and is a PhD candidate at Indiana University's School of Informatics. His academic work and contact information can be found by visiting www.dubfire.net/chris/. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

Surveillance State topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right