Thursday, March 29, 2007

Interview with Brewster Kahle

There is a 12-minute audio interview with Brewster Kahle on the Chronicle to Higher Education web site. The web site teaser says:
Brewster Kahle, director of the nonprofit Internet Archive and leader of the Open Content Alliance, a large-scale book-scanning project, outlines his vision for digital libraries.
Kahle begins by talking about the difference between his project (OCA) and the one being done by Google. The Open Content Alliance is digitizing 12,000 books per month in the U.S. They are doing full color scanning with OCR at 10 cents per page or an estimated $30 per book, according to what Kahle says in the interview. He believes this cost makes digitizing books more feasible for libraries.

As he later points out, digitizing one million books would cost $30 million. That would create a digital library that is larger than many town libraries. He says the library system in the U.S. is a $12 billion/year industry, so this cost would be less than 0.30% of the budget for one year. (He doesn't say where get got the $12 billion figure nor how he defines the term "library system".)

With more people relying on the Internet for information, he argues that getting books more findable online is important. (And who would argue against that?)

As always, it is interesting to hear what is on his mind. I find the $30/book figure to be quite intriguing. In December 2004, during his speech at the Library of Congress, he said that books could be digitized at a cost of $10 per book using a robotic scanner. Since then, he has developed the Scribe book scanner, which is a high-quality manual book scanner. I don't know what scanner and software is being used by the Open Content Alliance, but it would be informative to know what changed to go from $10 per book to $30 per book.

Technorati tags: , ,


Anonymous said...

Perhaps the 10 cents per page includes the cost of digitizing, as well as storage/backup for same?

There is an interesting article in the New Yorker about Google's scanning efforts, it reveals details which I don't believe have been mentioned before:

One thing that is different is the vast scale of Google's efforts, they are picking up 1000 books every workday! And that's just for Stanford, not even taking into account all the other libraries that are partnered with Google.

Also, the chief engineer on the book scanning project mentions on page 3:

“Previously, when people have done scanning, they always were constrained by their budget and their scale,” Clancy told me. “They had to spend all this time figuring out which were the perfect ten thousand books, so they spent as much time in selection as in scanning. All the technology out there developed solutions for what I’ll call low-rate scanning. There was no need for a company to build a machine that could scan thirty million books. Doing this project just using commercial, off-the-shelf technology was not feasible. So we had to build it ourselves.”

Google will not discuss its proprietary scanning technology, but, rather than investing in page-turning equipment, the company employs people to operate the machines, I was told by someone familiar with the process. “Automatic page-turners are optimized for a normal book, but there is no such thing as a normal book,” Clancy said. “There is a great deal of variability over books in a library, in terms of size or dust or brittle pages.”

Bookyards said...

Enjoyed reading your post.

For everyone’s info, we at Bookyards ( ) have compiled a good collection of free digital libraries with books available for downloading for free. Just go to Bookyards “Library Collections - E Books” at
There are approximately 550 digital libraries separated alphabetically and by category, with over 500,000 unique ebooks

Bookyards is a free online library located at