Ben Zimmer, executive producer of a Web site and software package called the Visual Thesaurus, was seeking the earliest use of the phrase “you’re not the boss of me.” Using a newspaper database, he had found a reference from 1953.
But while using Google’s book search recently, he found the phrase in a short story contained in The Church, a periodical published in 1883 and scanned from the Bodleian Library at Oxford.
Ever since Google began scanning printed books four years ago, scholars and others with specialized interests have been able to tap a trove of information that had been locked away on the dusty shelves of libraries and in antiquarian bookstores.
According to Dan Clancy, the engineering director for Google book search, every month users view at least 10 pages of more than half of the 1 million out-of-copyright books that Google has scanned into its servers.
Google’s book search “allows you to look for things that would be very difficult to search for otherwise,” said Zimmer, whose site is visualthesaurus.com.
A settlement in October with authors and publishers who had brought two copyright lawsuits against Google will make it possible for users to read a far greater collection of books, including many still under copyright protection.
The agreement, pending approval by a judge this year, also paved the way for both sides to make profits from digital versions of books. Just what kind of commercial opportunity the settlement represents is unknown, but few expect it to generate significant profits for any individual author. Even Google does not necessarily expect the book program to contribute significantly to its bottom line.
“We did not think necessarily we could make money,” said Sergey Brin, a Google founder and its president of technology, in a brief interview at the company’s headquarters. “We just feel this is part of our core mission. There is fantastic information in books. Often when I do a search, what is in a book is miles ahead of what I find on a Web site.”
Revenue will be generated through advertising sales on pages where previews of scanned books appear, through subscriptions by libraries and others to a database of all the scanned books in Google’s collection, and through sales to consumers of digital access to copyrighted books. Google will take 37 percent of this revenue, leaving 63 percent for publishers and authors.
The settlement may give new life to copyrighted out-of-print books in a digital form and allow writers to make money from titles that had been out of commercial circulation for years. Of the 7 million books Google has scanned so far, about 5 million are in this category.
Even if Google had gone to trial and won the suits, said Alexander Macgillivray, associate general counsel for products and intellectual property at the company, it would have won the right to show only previews of these books’ contents.
“What people want to do is read the book,” Macgillivray said.
Users are already taking advantage of out-of-print books that have been scanned and are available for free download. Clancy was monitoring search queries recently when one for “concrete fountain molds” caught his attention. The search turned up a digital version of an obscure 1910 book, and the user had spent four hours perusing 350 pages of it.
For academics and others researching topics not satisfied by a Wikipedia entry, the settlement will provide access to millions of books at the click of a mouse.