Tuesday, May 15, 2007

Competing with Google & others

In their final assignment, my students had to decide if a mythical library should digitize part of its holdings or not. What was important to me were the premises they based their decision on and their thought processes. Had they learned enough to argue for or against?

Several decided that they would not digitize any of the old first edition books by famous authors because they felt there was a good chance that Google (or Microsoft or OCA or...) might digitize those books. Why spend money doing what you might be able to use through another source? I found that to be a very interesting argument and, I'm sure, there are real libraries with the same idea.

Assuming that the books are in the public domain, and held by institutions that are cooperating with one of the mass digitization programs, then they should be digitized and made available in full-text for anyone to use. So I searched Google Book Search for "Huckleberry Finn," knowing that an early edition should be in the public domain. What did I find? I found several later editions of The Adventures of Huckleberry Finn that are available with a limited preview. I then looked for only books tagged as "full view books" and found ONE edition that is totally available. The edition was published by Plain Label Books (and who they are is a good yet unanswered question). Yes, this is the content of the book, so the content has been preserved, but you get the sense that the layout is not from an early edition, if fact it looks sterile like it has been retyped.

mmm...so using this limited test, I don't see a first edition copy of Huckleberry Finn search option to search library catalogues, I can easily find those libraries that have early editions of the book, although it would take some time to figure out who had the earliest edition according to this search (which uses in Google. Now the question becomes -- as a casual researcher -- do I care that a first edition is not available? No. Someone who is interested in the book -- the artifact -- itself might want to see the typeface, etc., online but likely would go to the institution for a better view. Using Google'sWorldCat). And I would think an early edition would be at Elmira College, but none pop out as being located there. (I'm guessing an early edition would be there, because of Mark Twain's association with that area.)

If a library wanted the content and did not care about the edition, then relying on Google (and others) may be possible. However, if that specific edition that the library has is important (perhaps because of notes in the margins), then the library should digitize it. If the library wants to make its materials known without digitizing them, then getting them catalogued in WorldCat would be quite helpful.

Deciding whether or not to digitize books is not a simple "yes" or "no." You need to think about "why" you want to digitize the books and consider what others are doing (so you don't perhaps duplicate effort). I would not only search to see what has been digitized, but I would also search those partner libraries to see what they own. And if possible, I might contact them to ask specifically if they were going to digitize the books that I had in mind.

The decisions are never as simple as we would hope...

Technorati tags: ,


Anonymous said...

The MS book scanning contract specifies that any books sent to MS to be scanned will not be sent back. Therefore only commonly held books would be sent to Kirtas (via MS). In fact Kirtas has recently bought a cut sheet scanner and a book binding cutter machine, apparently it's faster/cheaper to cut the bindings and cut sheet scan, than to use an APT 2400.

For actual rare books, you can't beat the safety piece of mind of a manual machine (such as a Zeutchel, i2s CopiBook, or Atiz BookDrive DIY, where the operators have full control.

Jill Hurst-Wahl said...

It's Jan. 31, 2008 and I have finally approved the comment that you see above. I'll be honest, I have no idea what the Microsoft contract says. I finally figured it was worth approving the comment, so I could comment on it.

First, of all even Google is cutting the binding on newer (non-unique) books and using a sheet-feeder for them. Those types of scanners can operate very fast. (Just think about the sheet-feeder on your office copier.) However, that type of equipment needs to be used on materials where there is no intrinsic value to the paper itself...and on materials that can withstand that type of handling.

Is Kirtas using "cut sheet" scanners? I don't know. The question is, would it be bad for them to build that type of flexibility? I think the answer is "no."

Finally, the commenter's last paragraph points to other machines that can be used for book digitization. Let me point out that there are many manual book scanners, more than what that person has listed. What a project uses will depend on several factors. Therefore, I encourage projects that are going to do book digitization to cast a wide net when thinking of hardware. In addition, don't forget that it may be more effective to send your materials to a service bureau.