Digitization 101: Google book search

Saturday, February 18, 2006

Google book search

diglet did a posting on Thursday (Feb. 16) about Google book search. It ties in nicely with a post I did in November about the poor quality of the scanning Google is doing. He writes that Daniel Clancy, Engineering Director for the Google Book Search Project,

...mentioned that Google was NOT going for archival quality (indeed COULD not) in their scans and were ok with skipped pages, missing content and less than perfect OCR -- he mentioned that the OCR process AVERAGED one word error per page of every book scanned!. The key point that I took away from this is that Google book project IS NOT an alternative to library/archive/archival/preservation scans.

When we digitize materials, we want to only digitize them once. Therefore, we want the digital asset that we create to be the best that it can be. I agree with Jim Jacobs of diglet that the libraries involved with Google should not be pleased with the quality that Google is turning out. (And neither should we.) If those institutions want archival quality scans of their books -- especially those older, fragile works -- they will need to digitize them again. If they want to preserve the full contents of the books, they will also need to digitize them again, since we know that Google efforts are not up to par.

Most disturbing was reading that Google is okay with the effort it is putting out. Let's hope that another book digitization project will show Google how it should be done.

Technorati tags: Google, Digitization

Saturday, February 18, 2006

Google book search

No comments: