Wed, Oct 14, 2009 - Page 9 News List

A looming data glut threatens computer science

While we’re still starting to think about buying external hard drives for our PCs, scientists are now thinking in petabytes

By Ashlee Vance  /  NY TIMES NEWS SERVICE , MOUNTAIN VIEW, CALIFORNIA

The types of projects the 14 universities have already tackled veer into the mind-bending.

For example, Andrew Connolly, an associate professor at the University of Washington, has turned to the high-powered computers to aid his work on the evolution of galaxies. Connolly works with data gathered by large telescopes that inch their way across the sky taking pictures of various objects.

The largest public database of such images available today comes from the Sloan Digital Sky Survey, which has about 80 terabytes of data, Connolly said. A new system called the Large Synoptic Survey Telescope is set to take more detailed images of larger chunks of the sky and produce about 30 terabytes of data each night. Connolly’s graduate students have been set to work trying to figure out ways of coping with this much information.

Purdue looks to carry out techniques used to map the interactions between people in social networks into the biological realm. Researchers are creating complex diagrams that illuminate the links between chemical reactions taking place in cells.

A similar effort at the University of California, Santa Barbara, centers on making a simple software interface — akin to the Google search bar — that will let researchers examine huge biological data sets for answers to specific queries.

Lin has encouraged his students to illuminate data with the help of Hadoop, an open-source software package that companies like Facebook and Yahoo use to split vast amounts of information into more manageable chunks.

One of these projects included a deep dive into the reams of documents released after the government’s probe into Enron, to create an analysis system that could identify how one employee’s internal communications had been connected to those from other employees and who had originated a specific decision.

Lin shares the opinion of numerous other researchers that learning these types of analysis techniques will be vital for students in the coming years.

“Science these days has basically turned into a data-management problem,” Lin said.

By donating their computing wares to the universities, Google and IBM hope to train a new breed of engineers and scientists to think in Internet scale.

Of course, it’s not all good will backing these gestures. IBM is looking for big data experts that can complement its consulting in areas like health care and financial services. It has already started working with customers to put together analytics systems built on top of Hadoop. Meanwhile, Google promotes just about anything that creates more information to index and search.

Nonetheless, the universities and the government benefit from IBM and Google providing access to big data sets for experiments, simpler software and their computing wares.

“Historically, it has been tough to get the type of data these researchers need out of industry,” said James French, a research director at the National Science Foundation.

“But we’re at this point where a biologist needs to see these types of volumes of information to begin to think about what is possible in terms of commercial applications,” he said.

This story has been viewed 2358 times.
TOP top