Published on Taipei Times
http://www.taipeitimes.com/News/worldbiz/archives/2005/10/04/2003274449

New alliance plans to digitize mass of books

INTERNET LIBRARY: Yahoo is leading an unlikely grouping of enthusiasts who aim to convert hundreds of thousands of books into digital format for use on the Internet

NY TIMES NEWS SERVICE, NEW YORK
Tuesday, Oct 04, 2005, Page 12

An unusual alliance of corporations, nonprofit groups and universities planned to announce an ambitious plan yesterday to digitize hundreds of thousands of books over the next several years and put them on the Internet, with the full text accessible to anyone with a computer.

The effort is being led by Yahoo, which appears to be taking direct aim at a similar project announced by its archrival, Google, whose own program to create searchable digital copies of entire collections at leading research libraries has run into a series of challenges since it was announced nine months ago.

The new project, called the Open Content Alliance, has the wide-ranging goal of digitizing historical works of fiction along with specialized technical papers. In addition to Yahoo, its members include the Internet Archive, the University of California and the University of Toronto, as well as the National Archive in England and others.

The digitization of print materials has been a continual effort on the part of various research libraries for the last several years. But the potential power of the new collaboration lies in the collective ability of many institutions to compare and cross-reference materials, said Daniel Greenstein, librarian for the California Digital Library at the University of California.

"This is the kind of platform we've been looking for for a long time," Greenstein said.

"Libraries digitize their stuff and put it up, but none of the libraries have comprehensive collections of everything. Now we can say: `We have this particular edition of Mark Twain, but it's not as good as that one over there,' and we add it to the collection," Greenstein said.

The Library of Congress, for instance, has one of the largest library collections in the world, but even that collection is incomplete. "It's all about gap-filling and collection development," Greenstein said.

The announcement establishes a new round in the battle between Yahoo and Google over index size -- the number of documents that can be found in a search engine's database.

Yet the new project's approach differs from Google's in several ways. Once a book has been digitized, Yahoo will integrate the content into its index and provide an engine for the group's Web site, www.opencontentalliance.org.