Wednesday, October 18, 2006

Article: Microsoft in digital book deal

I've had two e-mails wondering if I saw the news. Yes, I had. Another large company is diving into digitizing books.

I had hoped, when the first book digitization project was announced, that those who are not involved in the project would be able to learn from the project. I had hoped that those involved would share what they were learning and that the rest of us would "do better" because of what we could learn from them. I want them to blaze a trail and leave some markers along the way.

That hasn't happened because of the competition and thus the need for non-disclosure agreements (NDAs).

And so Microsoft is entering the fray. Microsoft has done work, I believe, with natural language processing (NLP), so there could be some interesting twists to what they will produce. But will they disclose information about their process so that other projects will benefit from what the learn?

I'm not holding my breath.

1 comment:

Anonymous said...

Some more details from Cornell re: the agreement with Kirtas:

http://www.news.cornell.edu/stories/Oct06/library.digitizes.ssl.html

Whilst the Kirtas machines are great for small bound soft and hardcover books, it will be interesting to see what technology/service provider Cornell/MS are using for larger format items (i.e. books larger than 11*14, large format maps, etc). The Library of Congress does quite a bit of large format digitization as an outsourcing service.

It will be interesting to find out what the resolution of the output files being sent to MS are.

Cornell has previously said that 400dpi is a good balance between the highest quality (and filesize) of 600 dpi and the lower mark of 300dpi.