My Thoughts on Google Web History

Last week, Google enabled Personal Web History.
And of course, everyone freaks out about privacy concerns. Oh no! Google knows what I do on the Internet!

OK, so if you haven’t been paying any attention at all, I understand that this might be a surprise. If you’ve been paying a little attention, it’s not that surprising, but maybe a little concern.

What I’d like to do here is discuss a number of Google’s products and how they relate to collecting personal information and integrating it into their product and their business model.

=== if you’ve been “paying attention,” skip ahead ===

The Product:
Google’s value as a search engine is, in a sense, knowing what you are looking for. How does it know what you are looking for? By analyzing more data than you can shake a stick at! And where does that data come from? Classically, it was by analyzing the link structure of the web, etc, etc. But it’s been 10 years since then, and they must be doing some smarter things. One invaluable asset they own and continue to collect is your clickstream, or what you click on and the sites you visit. If they know that, they know how to tailor their results.

The Business:
Google’s value as an amazingly successful company is their advertising platform. And what’s the value there? What makes it so much better than everyone else? Contextual ads. Text ads that are in context with what you are looking at. And of course the holy grail of contextual advertising is knowing what the user wants to do (say, buy a digital camera) and providing them the best link to accomplish that goal.

What does this all have to do with the Google Web History? Well, it boils down to what John Battelle calls the Database of Intentions. If Google knows and understands my surfing habits, both individually and in aggregate with the rest of the web, they can deliver the best possible results to me.

When I search for “tomcat”, am I searching for the fighter jet, the animal, or the Java application server? They *know* it’s the latter. You better believe I’m not going to get ads for jets. I’m going to get ads that say “Google is hiring Java programmers!”

==== End background information ====

Ok, many of you already know all of that. Not big whoop. So where’s the interesting part? How do they get all this data?

I’m going to break this discussion down into two parts.

1. The (relatively) obvious stuff – the things Google knows about you because you straight-up type it into a search box on their web site

2. The stuff that they know about you that you probably didn’t realize they did. That’s the interesting part.

Lets just review a bunch of their products and see how each one breaks down with respect to those two criteria.

This one is pretty straightforward. They know everything you search for. Obvious. They also know every link you click on. That’s not that very surprising, but not quite as obvious to most people. And they also know when you click it. So basically they know every single web page you go to from their search page. That’s a great start in aggregating data. But there is so much more data to gather from your online activities that don’t start at the Google home page!

Next up is AdSense – Google’s ad platform that let’s publishers (say, bloggers, geeky humor websites, etc) put ads on their site and get paid when people click. At first glance, it seems like a win-win for everyone. Google gets paid. The publisher gets paid. And the advertiser gets targeted traffic.

But there’s one more subtle thing going on here. Something that was pioneered back in the day by DoubleClick. Google *knows* that I’m reading, say, the New York Times. How? Well, they are the ones showing me the ads. If they serve up an ad, they know who is viewing it. That’s just the way the Internet works*. So even though I didn’t get there through, they still know. They know tons of pages that I go to. Every time you see a Google ad, Google knows you’ve seen, what page you saw it on, what time you saw it.

(* Small technical note: How do they know you see it? Google uses a cookie in your browser. You keep the cookie b/c that’s how you stay logged into Gmail and how they know your search preferences and remember your zip code. Everyone uses cookies. If you have a Google cookie, any time you browse a page that has Google content, they know. The real value in the cookie is the federated network. If you want, you can disable “3rd party cookies” in the Preferences in your browser, which is exactly what these are called. Yahoo has good details about the practice).

Google Analytics:
So everyone who runs a website wants to know how many people visit. I sure do. The best free way to track this? Google Analytics. Paste 2 lines of code into your web site and you instantly have some of the best analytics & stats for your site, completely free. Tres cool!

Where’s the rub? Why does Google do this? (Yes, I know it’s a great service to their advertising partners aiming to raise their click through rates, ROI, etc). But where’s the hidden value? Google now knows everyone who visits a site that uses Google Analytics. Just like that. I currently run about 10 different web sites. For whatever reasons, only 2 of them run Google ads. But all 10 run Google Analytics. Google knows every single person that comes to any of my sites.

Google Toolbar:
The toolbar is awesome. Popup blocker. Autofill. SMS. Custom search. Auto-Suggest. Completely free. Who wouldn’t want that? It’s such a great product. The catch? They know every site you go to when using the Toolbar, regardless of your search engine. Now you start to see why they push so hard to get the toolbar out there. (Of course it also has something to do with the fact that 9x% of people don’t change their default settings, and if they are the default search box, they get search traffic. If MSN search is the default, the don’t. Which is why they pay so much money to Dell, Mozilla Firefox, Opera, etc to get the toolbar & search box pre-installed.)

This one has a few interesting side effects. First, it all but guarantees that I’m keeping my Google cookie around forever. Who wants to sign-in to check mail every time? I certainly don’t.

When gmail first came out, the privacy advocates were up in arms about Google’s algorithms serving context sensitive ads based on the content of the message. They completely missed the point!

You want to hear the tinfoil hat privacy problems with GMail? Google knows my relationship with every single person in my address book. Oh, and remember when Gmail was invite only? And to prevent spammers from abusing the system, they made you verify your identity with an SMS text message? One account per SMS? Well blam-o! They can now tie you to a phone number!

Google Maps & Local:
Well, they certainly know where I live. “Save this address” was the first thing I did (after drooling over the AJAX & satellite imagery, of course). And thanks to Local search, they know I really like Thai food. And since I click-to-call my cell phone, they can continue to keep tabs on my cell phone number. And they know that I use a Treo and I have Verizon.
You think all those Google Maps mash-ups are free? Think again. There are currently 10,000’s mash-ups of data with maps all over the web. And you know what? Google gets all that data fed right back into their database. They know who’s looking at it, and who’s clicking on what. It’s just like Analytics & AdSense. More federated networks.

Google Talk:
Physical presence. Google now know *when* I’m online and whether I’m active or not. 24 hours a day monitoring. And for those of you who don’t really IM, let’s add this online presence to GMail too so we can track it there.

Google Mobile Search:
I think this one is particularly brilliant. Ever surf the web on your cell phone? Google is a good place to start. And what do they do? They realized that most of the internet looks like crap on the cell phone. So they proxy all the web pages through their servers and reformat them make them look really nice on the tiny screen. For free. I usually go to google, even if I know the URL, just to get thing formatted for mobile.
The catch? All your web traffic if funneled through Google again! It’s tough to install a toolbar on a phone browser. But they’ve done it again! They know every page I visit on my phone!

Google Reader:
I read dozens (hundreds?) of feeds in Bloglines, my feed aggregator. I don’t surf the web for news. I don’t store bookmarks and go to every morning. I just read my feeds. What’s the problem there? Bloglines (IAC) is in the middle. No cookies. No Google ads. Nothing. Suddenly, Google doesn’t know what I’m reading anymore. Can’t serve me ads. Can’t get at my clickstream.
So what do they do? Build a kickass Feed Reader, make it better than all the competition, and get people switching! Makes a lot of sense now, huh?

Google Desktop:
A search index of all the content on your local hard drive. Awesome. Next step? Synchronize this data across your computers. Of course you want to search from your laptop. And maybe while you’re at the office. How do you do this synchronizing? Easy! You upload the entire index of your hard drive to Google’s servers! Now they have access to all that data! Brilliant!

Google Checkout & Google Finance:
You mean other than getting my credit card number and knowing what’s in my 401K? I don’t see too much benefit to Finance, but I do see a ton of great data coming through Google Checkout. In addition to the obvious cash cow it could (potentially) become for them, they are now getting hard data about who buys what, when, and for what price. Cool.

Orkut & Dodgeball:
Know my friends. Know where I am in the physical world at any given time. Could be useful.

Google Browser Sync:
Want to keep your bookmarks synchronized at work, home, and on the laptop? Make sure you have the same extensions installed in both places? Hell, why not keep all the same tabs open in both places! Guess how this is done? You guessed it – you send every open web page, every bookmark, every extension, right up to Google’s servers. Clever girl!

Google Calendar, Google Docs & Spreadsheets:
To tell you the truth, I see a ton of strategic value in all these products for Google, but I haven’t really gotten my head around any alternative data gathering motivations. I think it’s a brilliant business strategy, and you better believe they will make serious bank with these, especially with branded Apps for your Domain, but I don’t see any other snooping privacy issues other than the glaringly obvious ones.

Free Google WiFi:
Use free Google wifi in California? Of course all the traffic goes through their servers. They are your ISP! They know every site you visit.

Google Web Accelerator:
Not a very well known one – this is a little proxy & caching server to speed up your web surfing experience. The result? All your traffic flows through their servers.

Google Secure Access:
I think this product is no longer available, but for a while you could use Google as a secure proxy server when you were in a public wi-fi hotspot. That way no one could snoop on what you were doing. A pretty nice free service. But again, they are getting every bit of traffic data that you surf.


So what’s my point here? This isn’t conspiracy theory or paranoia or a privacy rant. Google offers more amazing free services than any other company, and I use almost all of them.

What I find frustrating is that people don’t *realize* what goes on with their data and their information. 99% of people don’t understand these concepts. Maybe they care and maybe they don’t. Not for me to judge. But what gets my goat is the sudden outcry! “Oh no! Google has a web history! They know where I surf!” Of COURSE they know where you surf. Now they’re just SHARING it with you! You should have been way more pissed off for the last 8 years when they WEREN’T sharing it.

It sort of reminds me of when Google refused to turn over search records to the DoJ. And all the media spin was about Google having search records stored. The fact that MSN, Yahoo and AOL silently turned over all the data without a fight was completely glossed over in the MSM! Frustrating!

Some other thoughts if you’re still reading:

ISPs: Don’t think for a second that Google is the only company that can and does do this. No one does it nearly as well, but you better believe that your ISP (Comcast, Time Warner, AOL, Cox, etc) knows every single packet that you send and receive. Every single one (except secure sites starting with https). They don’t have the same coverage or breadth or application level understanding as Google, but they still get every packet.

I don’t work for Google. Lots of this may be considered speculation. I obviously don’t know how they analyze this data. Or what the value of my open Firefox tabs & cell phone number is to them. All I know is the features in their products, a bit about how they work, and what _I_ would do if I had all this data to play with ๐Ÿ™‚

So am I concerned? Does this bother me? Do I trust Google? Well, let me first say that it really doesn’t matter what I think. By the very nature of living online, you surrender all of this data to someone. If you switch search engines, you’re giving it to someone else.

To be honest, I’m currently much more concerned with Equifax, who has copies of my hospital records, and who makes our voting machines than Google. How often does the government seize computers for forensic evidence? Is that more or less scary to you?

Yes, I would like to see more transparency (see Attention Trust). Yes I would prefer more data portability (Google Web History alone is such a great feature that it has me locked in – I can no longer switch search engines).

Notable Missing Topics

Because I have to actually get some work done, some pieces I wanted to touch on are going to have to wait until next time. Specifically, Google’s Privacy Policy and what that all means. Other interesting related topics include anonymizing search records, deleting (or not deleting) data when the user requests it, releasing data in aggregate, security measures taken to ensure data can not be stolen (including by employees).

Also, I only really covered things that affect consumers – I didn’t get into any discussion about the data that they can gather from Ad Words, such as optimal pricing on keywords.


Giving up my privacy is a decision that I have made. It’s something I think about often and am cognizant of when surfing the web. Oh, and more importantly, I try to keep my confidential information (mostly finance stuff) confidential.

Oh, and I am careful to use my wife’s computer when listening to Lily Allen, Kennyb’s computer when surfing for pr0n, and my neighbor’s wifi when downloading Heroes ๐Ÿ™‚


One Response to My Thoughts on Google Web History

  1. […] any privacy implications this may have, Google has (once again) taken a huge step ahead of the competition in launching a […]

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: