Wed, Apr 09, 2014 - Page 9 News List

Google and the flu: Big data can help us make gigantic mistakes

As Google’s attempt to predict the spread of flu by using search terms shows, a theory-free model based on vast amounts of data can cause plenty of confusion

By John Naughton  /  The Observer

A concept of enduring utility rarely emerges from the market research business, but the Gartner hype cycle is an exception that proves the rule.

It is a graph that describes the life cycle of a technological innovation in five phases. First, there is the “trigger” that initiates the feverish excitement and leads to a rapid escalation in public interest, which eventually leads to a “peak of inflated expectations” (phase two), after which there is a steep decline as further experimentation reveals that the innovation fails to deliver on the original — extravagant — claims that were made for it. The curve then bottoms out in a “trough of disillusionment” (phase three), after which there is a slow, but steady, rise in interest (the “slope of enlightenment” — phase four) as companies discover applications that really do work. The final phase is the “plateau of productivity” — the phase where useful applications of the idea finally become mainstream. The time between phases one and five varies between technologies and can be several decades long.

As the “big data” bandwagon gathers steam, it is appropriate to ask where it currently sits on the hype cycle. The answer depends on which domain of application we are talking about. If it is the application of large-scale data analytics for commercial purposes, then many of the big corporations, especially the Internet giants, are already into phase four. The same holds if the domain consists of the data-intensive sciences such as genomics, astrophysics and particle physics: The torrents of data being generated in these fields lie far beyond the processing capabilities of mere humans.

However, the big data evangelists have wider horizons than science and business: They see the technology as a tool for increasing our understanding of society and human behavior and for improving public policymaking. After all, if your shtick is “evidence-based policymaking,” then the more evidence you have, the better. And since big data can provide tonnes of evidence, what is there not to like?

Which is why it is pertinent to ask where on the hype cycle societal applications of big data technology currently sit? The answer is phase one, the rapid ascent to the peak of inflated expectations, that period when people believe every positive rumor they hear and are deaf to sceptics and critics.

It is largely Google’s fault. Four years ago, its researchers caused a storm by revealing (in a paper published in Nature) that Web searches by Google users provided better and more timely information about the spread of influenza in the US than did the data-gathering methods of the US Centers for Disease Control and Prevention. This paper triggered a frenzy of speculation about other possible public policy applications of massive-scale data analytics.

“Not only was Google Flu Trends quick, accurate and cheap, it was theory-free. Google’s engineers didn’t bother to develop a hypothesis about what search terms — ‘flu symptoms’ or ‘pharmacies near me’ — might be correlated with the spread of the disease itself. The Google team just took their top 50 million search terms and let the algorithms do the work,” economist Tim Harford said.

Thus was triggered the hype cycle. If Google could do this for flu, surely it could be done for lots of other societal issues. And maybe it can, but in this particular case, the enthusiasm turned out to be premature. Nature recently reported that Google Flu Trends had gone astray.

This story has been viewed 1539 times.
TOP top