Tuesday, February 06, 2007

Interview with Brewster Kahle

In this interview, Brewster Kahle talks about orphaned works, copyright, book scanning projects and more. Two quick quotes:
We digitize 12,000 books a month and have 100,000 on the site now for free use and download.

We have been able to scan books for a total cost of 10 cents a page, so about $30 a book.
You can read the entire two-page interview here.

Technorati tags: ,


Anonymous said...

Thanks for the link to the interview. Interesting numbers, Kirtas is currently getting 12 cents per page from Microsoft. And their scanning rate is about 20% of what the Archive is achieving.

Jill Hurst-Wahl said...

mmm...I don't remember details that allow for a true apples-to-apples comparison. If they've been published, I hope you'll share the source.

Anonymous said...

It's not that the Kirtas machines run at 20% of the speed of the Archive Scribe machines. It's more a question of workflow. When the Archive tested an APT 1200 in Toronto, they found that the manual page turning Scribe did better in throughput per hour. Now the 2400 is faster for scanning than the manual Scribe, however you have to take into account the significant cost differential (~$30,000 for a Scribe, $190,000 for a 2400), plus the fact that even the 2400 still requires a dedicated operator for page flattening/book safety. Also with the Scribe workflow, once a page has been scanned, that's it. There is no need for post-processing, as the processing happens in real-time as the book is scanned, the operator is watching it's progress, in addition to turning the pages. I believe the quality of the Archive work is comparable to the Kirtas work, haven't seen any complaints. They use the same Canon 1Ds-MkII cameras.

The Archive is operating around 15 Scribes just in the Robarts Library at the University of Toronto. I believe this is in addition to Scribes being operated in San Francisco.

Kirtas is operating less than 10 machines (APT 2400) at it's Victor location. The amount of money required to scale the Kirtas operation to 20 or more machines is significant. Can it be done with an income of 12 cents per page with quality standards maintained? We can only watch and see.

Jill Hurst-Wahl said...

The words "dedicated operator" are a bit misleading. The person who is overseeing the scanning, may need to make adjustments to the book's position during the scanning process OR ensure that pages are flattened. However, Kirtas contends that the person does not need to watch over every page being flipped to ensure that it occurs correctly. (They are adamant about this.)

With the Scribe, there is a person automatically involved in every page scan, since it is a manual scanner. So both machines need someone present, although their level of involvement maybe (should be) different.

You say that post-processing occurs as the pages are scanned (with the Scribe). Does that mean that something needs to be tweaked to ensure that a page is processed properly, that the operator (who is also scanning the books) needs to stop to take care of that?

Last year, I spent time trying to find details on the Internet about the Scribe and was amazed at how little was available about what the machine is, etc. I haven't checked in quite a while, but I would hope that more information is publicly available.

BTW I found this page a few days ago that gave a revenue number for Kirtas. We tend to think of this as being a small company, because we don't see their machines everywhere. However, this company is having a significant impact on the digitization of bound materials and their machines -- albeit expensive -- are being purchased around the world. There are a significant number of Kirtas machines out there being used. And I must admit I know of groups that are yearning to find funding to acquire a Kirtas scanner.

Anonymous said...

That link you provided for Kirtas revenue numbers appears to be some kind of investor/distibutor promotional material. There are no Kirtas machines in Singapore. All machine sales mentioned in that article occurred in 2005.

There are only two companies that own two Kirtas machines (NewsBank being the other one). All the rest that have been sold are single machines, and many of these are owned by distributors.

The total number of Kirtas machines out in the field is less than 60 machines (1200 and 2400). Not all of these are in use anymore. No customer has purchased an APT 800.

That's not to say that Kirtas' competitor (4digitalbooks) is selling many machines either. The demand for >$120K robotic bookscanners is extremely low, especially when you limit the maximum book/page size to 11*14 inches.


Check out magazines 12 and 13, they show some interesting applications of ditization in many countries. They also have some good technical articles.

Most libraries/institutions would be better served buying overhead scanners that can be operated by the students/patrons, for example the i2S CopiBook. There are far more CopiBook machines in operation in US libraries than there are Kirtas scanners. A user of the library could in the future find an interesting article, and use the scanner himself, without the harmful effects of photocopying, or exposing the document to anything other than normal light levels. He/she would be able to then receive a digital image/document, via CD/DVD, usb dongle or perhaps as email. The scanner consumes no consumables (except power). Shutter life is limitless. This way the librarians and staff can continue doing their jobs, rather than babysitting robots. And the end of changing toner carts!

This is the only book scanner that I know of where an organization bought one, and then after evaluating it's performance, ordered four follow-up units (University of Florida). The unit cost is less than $50K, and they are achieving scan rates of 40 pages in 2 minutes, which comes out to 1200 pages per hour, turning done manually. And 600 pages per hour when digitizing large newspapers (something the Kirtas machines cannot scan). The operations manager notes that the student operators can be very easily trained on the CopiBook, whereas other machines including the Kirtas and 4digitalbooks machines require multiple-day training sessions.

The only libraries that I know of with Kirtas scanners are the Rochester Public Library, and the New York State Library in Albany. Neither of these machines will be accessible to the public.

Given that you are in the vicinity, ask Kirtas if you can visit their service bureau operation. From what I have seen, each machine requires an operator full time whilst the machine is running, you won't see an operator leave the machine whilst it is scanning a book. That's why they now have almost 20 machine operators employed (two shifts on 10 machines). Surely if it was even somewhat automated an operator should be able to run two machines? For example, load one machine with a book, start it, then move to the 2nd machine and load that book, start it, come back to the 1st etc. I just don't see the labor reduction. The Kirtas website used to mention 4 machines to an operator. That reference has since been removed.

The process of using a Scribe means that after the image has been taken (both sides at once), the image is displayed on a screen viewable by the operator, and they can move the crop box position if it is about to cut-off text/content. So if there is writing deep in the binding or outer margin for example, they can quickly move the crop box over so content is not lost.

The software that the Archive uses with the Scribe is available as open-source code, so you can run multiple instances of the program is needed. The Kirtas solution requires a USB "dongle" license, and I believe they charge around $5,000 for each additional one you want to run (You get one license included when you buy a machine).

Jill Hurst-Wahl said...

First, I want to thank you for your VERY long and detailed comments. I truly appreciate them.

I have been to Victor and seen the Kirtas service bureau when it was much smaller. I've also seen the machine in operation at Rochester Public Library. So I know that an operator is always there, but I also know that the person may be able to focus away from actual scanning to do something else. It will depend on the book. Can the machine operate totally alone (which is what we all wish)? No.

I actually think we're in agreement on the need for an operator, even though it may not sound like it. We disagree on the amount of attention the operator much have on the individual pages being scanned.

One note about the machine at Rochester Public Library. I know that they were looking to do digitization work for others (generally gov't offices). So there is a possibility that they might take on other work.

Anonymous said...

One correction to the above: Kirtas did sell two APT 1200 machines to a customer in Japan, so that's three customers with two APT 1200 machines.