Why the big fuss over some AOL data?

by Mathew on August 7, 2006 · 4 comments

I know I’m late to the party on this one, but I have to say I’m kind of confused about the huge outcry over AOL releasing a pile of supposedly “secret” personal data as part of what seems to be a misguided research effort, something for which it has now apologized. Obviously it’s a bit of a public relations gaffe, in the sense that the information kind of leaked out and everyone got all hot and bothered about it before there was a response, but I don’t really see why it became such a giant sh*tstorm in the first place.

Yes, there was plenty of personal information in the search logs (all two gigabytes of them or whatever it worked out to) that someone could theoretically do something nefarious with — theoretically — but doesn’t that pretty much describe the Internet? As more than one person has pointed out, including one of the commenters on a related post by Paul Kedrosky, much of this information is already available to people who are determined to get it (or pay for it). Didn’t Scott McNealy tell us that we don’t have any privacy on the Internet and that we should get over it?

Markus Frind of Plenty of Fish has said that the personal search data includes some potentially disturbing info, such as a repeated search by one user for information on how to kill your wife. But how do we know that this person was actually looking to kill his wife? Maybe he’s writing a screenplay. He also apparently searched for “steak and cheese” and “poop,” or at least the same user ID did. What does that indicate about the user’s overall mental state? Probably nothing — although I’ve often thought that people who eat steak and cheese are inherently unbalanced. (There’s more data surfing here).

As far as I’m concerned, setting off all kinds of personal privacy alarms over the AOL data is an over-reaction. As Greg Yardley points out, we have no privacy when it comes to Internet searches, and the sooner we get used to it the better. Greg Linden of Findory also seems to think the whole fuss is overdone, and so does venture capitalist Jeff Nolan.

Update:

I wrote a similar post to this one for my “official” blog at globeandmail.com/blogs/geekwatch and judging by the comments that this post has sparked, I seem (or AOL seems) to have struck a nerve when it comes to personal privacy and the Internet. One reader mentioned the recent New York Times article about the AOL debacle, in which a journalist tracked down a woman living in a small town based on some of the searches she did, which included personal information such as the name of the town — and another reader didn’t think much of my counter-argument that this says more about the journalist and the newspaper that tracked her down than it does about AOL’s release of the data. Some readers also thought my comparison to Rogers releasing information about pay-per-view rentals was spurious, since that wouldn’t include personal data (and I admit it’s not a great analogy).

As I mentioned in my response to those comments, I realize that there is a lot of information included in what AOL released, and that by putting two and two together (as the NYT did) someone could come up with a pretty good idea of who did those searches. I guess the point I was trying to make — and perhaps I went a little overboard in doing so — is that much of that information is already out there, and is effectively publicly available. If you type in your name or address or credit card number, it can be tracked and accessed, and while it takes a little more effort and knowhow than sifting through AOL’s search data, it doesn’t take a whole lot more. Privacy of information on the Internet is not black and white — it is shades of grey. I guess that was my point.

Law professor and blogger Michael Geist, whose opinion I respect, says that I am wrong and that the AOL incident illustrates why such search data should not only not be released but shouldn’t even be kept. John Battelle says that he was secretly thrilled at the New York Times story because “the silver lining of a data leak like this is that it allows the culture to have a conversation about what we’re getting into here by tracking all this data.”

  • Pingback: Chuquet

  • Pingback: Michael Geist - Blog

  • http://www.aolsearchlogs.com/forum/ AOLSearchLogs.com

    We created a web interface to the data. It is available at http://www.aolsearchlogs.com/

  • http://peterdawson.typepad.com /pd

    Matt, I think that privacy is the main issue here.. and not its fair to over react, because its possible to find out stuff about a users. Typical e.g NYT reveals ID and PIC of a user 4417749. This is a casual reverse engingeering effort in progress !!

    Just imagaine what else , others can data mine within the data set ?? Our intiatl pentest boxes reveals a lot and interesting information !!

Older post:

Newer post: