Google and the end of everything


My choice for this weekend’s Big Think post stems from a recent Wired article by Chris “The Long Tail” Anderson, in which he attempts to argue that the ability to sort through gigantic databases of information — something he associates with Google — will mean “the end of the scientific method.” As I understand it, his argument is that since we have so much data, we can just use algorithms to find correlations in the data, and that will produce as much insight as years of traditional scientific research. The piece is entitled “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” and there’s a somewhat related post from Kevin Kelly (another Wired alumnus) on his blog Technium that he has entitled “The Google Way of Science.”

I think Anderson’s piece is an interesting thought experiment, and it forces us to think about how the sheer quantity of data we have available to us changes how we do things. However, like many others who have responded to his article (check the comments on the article for more), I think it has a number of serious flaws — and they are all summed up in the title, which implies that having a lot of data and some smart algorithms to sift through it means “the end of the scientific method.” That’s just ridiculous. It reminds me of philosopher Francis Fukuyama writing a book in the early 1990s about “the end of history,” in which he argued that the clash of political ideologies was more or less over, and that liberal democracy had effectively won. As we’ve seen since then, this was more or less complete rubbish.

Anderson argues that “The Petabyte Age is different because more is different.” There’s no reason for believing that this is true, however. Expanding the amount of data — even exponentially — doesn’t change the fundamental way that the scientific method functions, it just makes it a lot easier to test a hypothesis. That’s definitely a good thing, and I’m sure that scientists are happy to have huge databases and data-mining software and all those other good things; but that doesn’t change what they do, it simply changes how they do it. With all due credit to Craig Ventner of the Human Genome Project, sifting through reams of data about genetic pairs and sequencing them can help tell us where to look, but not what to look for, or what it means.

Whenever a game-changing technology like Google comes along, it’s tempting to extrapolate its benefits to virtually every sphere of our lives: “Hey, this thing Archimedes came up with called the screw is the best thing ever — now we never have to use nails or pulleys ever again!” But to take what Google does with PageRank and extend it to all of scientific research is absurd, (Kevin Kelly thinks so too). Even Google’s fiercest defenders would probably take issue with Anderson’s argument that its approach to ranking pages works because “If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.” The fact is that many of Google’s results are useless and bad, despite the fact that PageRank is functioning exactly as advertised.

And for the record, correlation still doesn’t mean causation, and likely won’t for the foreseeable future. Correlation just means that you found some data that shares some kind of relationship with other data; it can help suggest causation, but it doesn’t replace it.

Comments (11)

  1. simoncast wrote::

    Correlation means very little on very large data sets causation or otherwise. On really big data sets the probability that data will correlate while having no link whatsoever increases. With a large enough data set the probability of random bits of data correlating is almost a certainty.

    Sunday, June 29, 2008 at 12:29 pm #
  2. ianbetteridge wrote::

    Exactly. And that raises the question “How do you decide which algorithm to use?” Anderson seems to have no answer to that.

    Sunday, June 29, 2008 at 12:31 pm #
  3. simoncast wrote::

    Flippant answer is “know what you are doing” :) I've not looked into it for a while but there are a lot of techniques in more esoteric math that will probably find usage here. I think branches such as topography and set theory could yield some interesting results.

    I expect we'll see lots of bloggers trotting out their high school math without really understanding the limitations of the techniques.

    Sunday, June 29, 2008 at 12:42 pm #
  4. mathewi wrote::

    Excellent point, Simon. A nice example of why huge amounts of data might actually make it *harder* to apply Anderson's theory in some ways, rather than easier.

    Sunday, June 29, 2008 at 12:36 pm #
  5. ianbetteridge wrote::

    Part of the problem with Anderson's theory is that there are many, many different algorithms which work for any given set of data – and without causality, you end up with no reasonable method of determining which of the many algorithms which compete is the best-possible (and thus most-likely) answer.

    Sunday, June 29, 2008 at 12:30 pm #
  6. David wrote::

    Haha, wow, I had not seen that Chris Anderson piece. Totally agree with your take on it. It's like those scientists from the late 1800s who were pretty sure that basically all scientific knowledge had been discovered, and everything left was just cleanup, filling in a few blanks.

    These “the end of X” proclamations are wrong so much of the time it's basically useless to make them. Does Chris Anderson really want to make the same argument that so many Doomsday cults for the last 2000 years have been making? It's so over the top that I suspect he sensationalized it on purpose to garner attention.

    Sunday, June 29, 2008 at 12:31 pm #
  7. ianbetteridge wrote::

    Well, they sell books, don't they? :)

    Sunday, June 29, 2008 at 12:59 pm #
  8. Matthew — I think you have a point. Even if the technology finds every correlation, we still need science to prove causality.

    I started to write an increasingly long comment here after reading this, then went and stuck it at… instead.

    Besides, this way I got to post a Google LOLcat ad.

    Thanks for getting me thinking!

    Monday, June 30, 2008 at 12:09 pm #
  9. mathewi wrote::

    Thanks, Alistair — good post.

    Monday, June 30, 2008 at 12:39 pm #
  10. JoeDuck wrote::

    A great big think item Matt, and unless he qualifies his idea more I'm with you on this one.

    However it seemed to me he's making a more reasonable and subtle point than a wrong suggestion that correlation=causation.

    Generally science bases descriptions of behavior or biology or other phenomena on data *samples*. As the sample size approaches 100% our models become closer to the full reality rather than just a model of that reality. I don't agree that we are anywhere near the point of having enough data to do much more than target ads a little better, but in the areas where we have huge data sets I think we will start to find that Google analysis may be able to predict and describe things better than any previous models.

    Far more significant will be conscious computing, which is likely to change the game for everything and everybody almost as soon as that Genie's out of the bottle – probably in about 15 years.

    Tuesday, July 1, 2008 at 4:33 am #
  11. Shannon T Alston wrote::

    nice article! nice site. you're in my rss feed now ;-)
    keep it up

    Monday, February 2, 2009 at 8:19 pm #

Trackbacks/Pingbacks (2)

  1. Does Big Search change science? | Bitcurrent on Sunday, June 29, 2008 at 6:15 pm

    […] Ingram at the Globe and Mail takes Wired to task over a recent article that implies Big Search will save the world and change the way we solve […]

  2. […] pose and falsify or support hypothesies. Mathew Ingram takes issue with the Wired article in Google and the end of everything and Alistair Croll piles on in Does Big Search change science? emphasizing the familiar scientific […]