Google and the flu: Big data can help us make gigantic mistakes

Wed, Apr 09, 2014 page9

Google and the flu: Big data can help us make gigantic mistakes

As Google’s attempt to predict the spread of flu by using search terms shows, a theory-free model based on vast amounts of data can cause plenty of confusion
- By John Naughton / The Observer
A concept of enduring utility rarely emerges from the market research business, but the Gartner hype cycle is an exception that proves the rule.
It is a graph that describes the life cycle of a technological innovation in five phases. First, there is the “trigger” that initiates the feverish excitement and leads to a rapid escalation in public interest, which eventually leads to a “peak of inflated expectations” (phase two), after which there is a steep decline as further experimentation reveals that the innovation fails to deliver on the original — extravagant — claims that were made for it. The curve then bottoms out in a “trough of disillusionment” (phase three), after which there is a slow, but steady, rise in interest (the “slope of enlightenment” — phase four) as companies discover applications that really do work. The final phase is the “plateau of productivity” — the phase where useful applications of the idea finally become mainstream. The time between phases one and five varies between technologies and can be several decades long.
As the “big data” bandwagon gathers steam, it is appropriate to ask where it currently sits on the hype cycle. The answer depends on which domain of application we are talking about. If it is the application of large-scale data analytics for commercial purposes, then many of the big corporations, especially the Internet giants, are already into phase four. The same holds if the domain consists of the data-intensive sciences such as genomics, astrophysics and particle physics: The torrents of data being generated in these fields lie far beyond the processing capabilities of mere humans.
However, the big data evangelists have wider horizons than science and business: They see the technology as a tool for increasing our understanding of society and human behavior and for improving public policymaking. After all, if your shtick is “evidence-based policymaking,” then the more evidence you have, the better. And since big data can provide tonnes of evidence, what is there not to like?
Which is why it is pertinent to ask where on the hype cycle societal applications of big data technology currently sit? The answer is phase one, the rapid ascent to the peak of inflated expectations, that period when people believe every positive rumor they hear and are deaf to sceptics and critics.

It is largely Google’s fault. Four years ago, its researchers caused a storm by revealing (in a paper published in Nature) that Web searches by Google users provided better and more timely information about the spread of influenza in the US than did the data-gathering methods of the US Centers for Disease Control and Prevention. This paper triggered a frenzy of speculation about other possible public policy applications of massive-scale data analytics.
“Not only was Google Flu Trends quick, accurate and cheap, it was theory-free. Google’s engineers didn’t bother to develop a hypothesis about what search terms — ‘flu symptoms’ or ‘pharmacies near me’ — might be correlated with the spread of the disease itself. The Google team just took their top 50 million search terms and let the algorithms do the work,” economist Tim Harford said.
Thus was triggered the hype cycle. If Google could do this for flu, surely it could be done for lots of other societal issues. And maybe it can, but in this particular case, the enthusiasm turned out to be premature. Nature recently reported that Google Flu Trends had gone astray.
“After reliably providing a swift and accurate account of flu outbreaks for several winters,” Harford said, “the theory-free, data-rich model had lost its nose for where flu was going. Google’s model pointed to a severe outbreak, but when the slow-and-steady data from the [US government center] arrived, they showed that Google’s estimates of the spread of flu-like illnesses were overstated by almost a factor of two.”
So what went wrong? Simply this: Google does not know anything about the causes of flu. It just knows about correlations between search terms and outbreaks. Yet as every high-school student knows, correlation is quite different from causation. And causation is the only basis we have for real understanding.
Big data enthusiasts seem remarkably untroubled by this. In many cases, they say, knowing that two things are correlated is all you need to know. And indeed in commerce that may be reasonable. I buy stuff both for myself and my kids on Amazon, for example, which leads the company to conclude that I will be tempted not only by Hugh Trevor-Roper’s letters, but also by new releases of hot rap artists. This is daft, but does no harm. Applying the kind of data analytics that produces such absurdities to public policy, however, would not be funny. Yet it is where the more rabid big data evangelists want to take us. We should tell them to get lost.
Comments will be moderated. Keep comments relevant to the article. Remarks containing abusive and obscene language, personal attacks of any kind or promotion will be removed and the user banned. Final decision will be at the discretion of the Taipei Times.
Most Popular

Wed, Apr 09, 2014 page9

Google and the flu: Big data can help us make gigantic mistakes

As Google’s attempt to predict the spread of flu by using search terms shows, a theory-free model based on vast amounts of data can cause plenty of confusion

Most Popular

Weather front, quakes hit Taiwan

Taiwan’s first electric tractor from Volvo Trucks unveiled

China uses UN against Taiwan allies: ex-staffer

Aftershocks could last for a year: CWA

Nymphia Wind becomes first Taiwanese to win ‘RuPaul’s Drag Race’

You might also like

China’s misuse of Resolution 2758

By Lin Shih-chia 林世嘉

Singapore is facing a dangerous world without Lee

By Karishma Vaswani

India-Taiwan tech collaboration

By Manharsinh Yadav

Brahma Chellaney On Taiwan: Biden’s China policy prioritizes diplomacy over deterrence