Monday, August 28, 2006

Agreement between California Digital Library and Google

The Chronicle of Higher Education has an article about the agreement between the California Digital Library (CDL) and Google. The article includes a pointer to the actual contract. Thankfully, the Chronicle has read the contract and tell us:
According to the document, the university will provide at least 2.5 million volumes to Google for scanning, starting with 600 books a day and ratcheting up over time to 3,000 volumes a day. Materials pulled for scanning will be back on the shelves of their libraries within 15 days.
The contract outlines who will pay for what between Google and CPL, how each party can use the digitized materials, and how branding will be handled.

Everyone will find something of interest in this document. What I find interesting is that the books can go off-site to a site selected by Google for digitization. (The agreement uses the words "provided by" and "controlled by", but does not say "owned by.") The external facility will be named in the project plan. The agreement also says:
Google will use reasonable commercial efforts to ensure that Selected Content is returned within ten (10) business days of its being scanned or after a determination is made by Google that Selected Content will not be scanned. Notwithstanding the foregoing, Google agrees that no materials in a Project will be off University's shelves for longer than fifteen (15) business days or for a longer period as may be specified in the Project Plan.
I know of a facility that was bulking up during the spring and was hiring more technicians to do actual digitization; all in anticipation of a project that was coming. There is nothing out in public that connects this contract with that facility/vendor, so I'll not publicly tie the two together, since it may be pure coincidence. However, I would have to wonder about the impact on 3,000 books a day on any digitization facility. How many book scanners -- running 24/7 -- would you need? Even if the scanners are doing 1,200 - 3,000 pages per hour, that is a tremendous load. (The automated book scanners by Kirtas and 4DigitalBooks fall within that range.)

Of course, the confidentiality portion of the agreement will ensure that we may not know how things proceed and what problems (or successes) they have. Will they really be able to do 3,000 books per day? Maybe someone will give us a clue.

Technorati tags: ,


Anonymous said...

The Kirtas machines are being used for the MS digitization deal. However, they only get about 600 pages per hour from each 2400 machine, when image processing / Q.A / image cleanup is taken into account. That's what the Internet Archive gets from it's manual scanners, $25,000 versus $189,000, it's not hard to see why the Kirtas machines aren't exactly flying off the shelves.

Jill Hurst-Wahl said...

It's Jan. 31, 2008 and I have finally approved the comment above. Why did it take so long? Honestly, I had several comments waiting to be moderated that I really needed to think about. I want comments to be honest as well as true. In this case, the commenter notes that "Kirtas machines aren't exactly flying off the shelves", however, I think we need to think about the word "flying." Kirtas is obviously selling machines and creating newer models. From what I can tell, they are selling more machines that we (the public) are aware of.

It is important to realize that every book digitization machine has its market. A machine that costs $189,000 may be perfect for a particular project or service bureau, while a less expensive book digitizer may be better for someone else.