Thursday, September 28, 2006

The Espresso Book Machine

Printing books on-demand is undoubtedly the wave of the future. This article published in June 2006 announced a machine that makes this technology more readily available now. On Demand Books has begun beta testing its Espresso Book Machine, which can print black-and-white text for a 300-page paperback with a four-color cover, and bind it together in three minutes. The article states:
"Our goal is to preserve the economic and ergonomic simplicity of the physical book," said [Jason] Epstein [of On Demand Books], who laments the disappearance of backlist and ready access to books in other languages. By printing from digital files, ODB hopes to make warehousing -- —and much of today's distribution model -- —obsolete. "In theory," said Epstein, "every book printed will be digitized, which means the market will be radically decentralized. A bookstore with this technology, without any expense to themselves [other than the machine] can increase their footprint." Of course, that also means that Kinko's or Wal-Mart can transform themselves into mini-bookstores, especially given the machine's affordability. [Dane] Neller anticipates that it will retail for less than $100,000.
Not surprising, Brewster Kahle, who has also been involved both in digitizing books and printing books on demand, will be making books available through this service. New York Public Library is noted as getting one of these machines in September 2006.However, I don't see anything on the NYPL web site about it. Perhaps they are quietly launching the service?

Technorati tags: ,

Google's book scan project gets bigger

Google now has an agreement with the Complutense University of Madrid to add its library to Google's book scanning project. This is first library in a non-English-speaking country to join the project. You can read more about it here.

I doubt that this is raising too many eyebrows. Google is on a mission, even if publishers don't agree with all of the components of the project. The real big news will come when Google and the publishers come to an agreement about scanning copyrighted works in order to create this massive electronic index.

Technorati tags: ,

Tuesday, September 26, 2006

Life Cycle Information for E-Literature (LIFE) reports

I've just looked at the summary report from the LIFE project. Published in May 2006, the report discusses the life-cycle of e-literature and the costs associated with the life-cycle. The summary and final reports do not purport to have all of the answers, but they have used predictive work to begin to calculate costs and then ask more questions.

Page three of the final report states this information, which will make you want to read more:
LIFE established that in the first year of a digital assets existence;
  • The lifecycle cost for a hand-held e-monograph is British Pound (BP) 19
  • The lifecycle cost for a hand-held serial is BP19
  • The lifecycle cost for a non hand-held e-monograph is BP15
  • The lifecycle cost for a non hand-held e-serial is BP22
  • The lifecycle cost for a new website is BP21
  • The lifecycle cost for an e-journal is BP206
The total cost is comprised of costs in several categories which are:
  • Acquisition
  • Ingest
  • Metadata
  • Access
  • Storage
  • Preservation
The costs are not always the same, but depend on what the e-literature is.

For many projects, just seeing this information above will make them stop and think. We tend to gloss over the costs and think of some as just being time, but there is a cost to time. And everything we do with a digital asset costs us something.

To read the reports -- and access additional information -- go to the LIFE documentation page.

The project has ended, but given work being done elsewhere (e.g., iPRES), more will surely be discovered, documented, and shared.

Technorati tag:

160 megapixel digital camera

Quoting digitizationblog:
From slashdot today: Swiss camera maker Seitz has released a 160 megapixel digital camera. The highest end scanners, like the Cruse CS 155/450 P, produce 150 megapixels. 160 megapixels is 10 times more than higher-end consumer-grade digital cameras' megapixel ratings.
This means that "scanners" that use cameras can become even higher quality. It also means that photos taken of a collection can be incredibly detailed!

Our tools for creating digital surrogates are getting better and better...!

Monday, September 25, 2006

Alone in the Archives

While doing a bit of Internet searching today, I found this blog written by the archivist at Hobart and William Smith Colleges (Geneva, NY). This blog is written by someone who is working on digitization projects and attending to normal archive tasks. It is truly a view from the trenches.

One of the things this person is working on is digitization scrapbooks. In August, the author, Linda Clark Benedict, wrote:

Tuesday I spent the day with the Colleges’ photographer...Once he was all set up he clicked and I turned pages. I am very excited about seeing these on the web. He would photograph a page, then if there was a folded item, e.g. program, news clip, I would open it and he would photgraph it again. The webpage will be set up so if you click on the item it will appear to open.

The scrapbooks we did were the William Smith 50th Anniversary scrapbook, and the Francis Belle Eddy scrapbook. Eddy was in the WSC charter class (1912) and seems to have documented her college life very well. It will be great fun to read in detail. And the scrapbooks, already in bad condition will not have much further handling. I think I will put them on display when they kick off the webpage, however.

Blogging is a great way of sharing information from the trenches. It would be wonderful if more people in the trenches of digitization projects did this. The lessons we all would learn would be invaluable.

Friday, September 22, 2006

OneWebDay: Sept. 22


OneWebDay is a new celebration. (I learned about it from Larry Lessig's blog.) According to the web site:
The mission of OneWebDay is to create, maintain, advance, and promote a global day to celebrate online life.
The wiki (yes, this celebration has a web site, blog and wiki) gives an interesting list of activities, some of which you might want to try even if you can't do them today. For example:
  • Teach the mayor to blog.
  • Wire a town, or create a wireless hotspot.
  • Employees: teach your boss to IM.
  • Parents: get your kids to teach you to IM.
  • Companies: run a virtual meeting for work-at-home employees.
With the work you do, you are helping to create an online world and an online life for people. If you do nothing else, take a moment today to think about the impact you are having OR tell a stranger about the online world you are creating (like the person you purchase you cup of coffee from).

Technorati tag:

Thursday, September 21, 2006


I was approached this spring to be the subject of a member profile for the Special Libraries Association's journal, Information Outlook. The resultant three-page article, entitled “For Career Growth, Forget the Label and Recognize the Opportunities,”(PDF) is in this month's issue. Suffice it to say that being profiled was truly an honor.

These profiles, which are now a monthly feature in Information Outlook, are helping us to see the diversity in SLA. I hope that some day SLA will take the collection of profiles (or pieces of them) and use them with the student chapters, as a way of demonstrating the roles that librarians (information professionals) have.

Thanks to SLA for the recognition! And thanks to Forrest Glenn Spencer for making it a cool experience.

Wednesday, September 20, 2006

Cornell University announce new copyright guidelines governing use of digital course materials

These guidelines -- developed by Cornell University in conjunction with the Association of American Publishers (AAP) -- likely do not answer everyone's questions about the use of digital course materials, but should help the discussions that occur on this topic.

By the way, one of the statements, that stood out to me, is:
Instructors should not direct or encourage students to print unauthorized copies of course content. Students seeking information about how to make or acquire personal copies for purposes of private study, scholarship, or research should be directed to consult available resources.
I'm sure that instructors saw this as a way of getting around the law. Here Cornell is saying that the practice in their eyes in unacceptable. I wish, though, that Cornell's Copyright Information Center said more on what specific resources students should use to obtain legal copies. There should be a link that says something like "Need a legal copy of course materials?" as well as information on the consequences of obtaining illegal copies. This is an opportunity to educate there constituents even more and hope they jump on the opportunity.

Technorati tag:

Tuesday, September 19, 2006


When we digitize, we strive to make something that will last a long time. We struggle, though, with the techniques to ensure permanence and with ideas of what "long time" really means. We know, however, that we want to preserve a record of what has occurred and is happening, and that doing so is important.

Yet there are cultures and rituals that are built on impermanence. Yesterday I attended events at the University at Buffalo where the Dalai Lama is speaking this week. I witness several examples of impermanence, with the most amazing one being a sand mandala. These are created as a meditation then deconstructed in a ritual that demonstrates how the world is not permanent.

Today I find myself wondering if we are striving to make too many things permanent. Not just in digitization, but in other areas of our lives. We create many things so they will last forever, but should they? Are we missing an important lesson in the ways of nature by not realizing that things are meant to fade and decay? Should everything follow the way of our memories -- slowly fading and changing?

And should we remember that even our permanent records can be imperfect reminders of the events that they document, like these photos of mine?

Undoubtedly my thoughts of impermanence will fade as I emerge myself back into thoughts of digitization and digital preservation. However, I hope that I will keep more in the front of my mind questions about if what we are preserving needs to be preserved.

Event: The 4th Bi-annual Conference of Globalization, Digitization, Access, and Preservation of Cultural Heritage, Nov. 8 - 10, 2006

From the Archives discussion list with additional information,

The Fourth Bi-annual Conference of Globalization, Digitization, Access, and Preservation of Cultural Heritage (referred to as "SOFIA 2006") conference is rapidly approaching and the conference program is now available at the website:

There are 80 presenters scheduled for the main sessions, representing 35 countries. There will also be a poster session presented by LIS students from several different countries as well. Multiple vendors will be present and there will be many opportunities to chat and network with colleagues from around the world and learn about LIS issues faced there.

Register now to ensure a place at this international opportunity! You must register by 30 September to receive the conference rates for the hotel.

  • Libraries, museums, archives, and record centers
  • Digitization and access
  • Intellectual property
  • National and international information policies and projects
  • Preservation
  • National libraries
  • Association initiatives
  • Library/information science education
  • Collaboration and cooperation Digital libraries

Monday, September 18, 2006

Event: The Persistence of Memory Conference, Dec. 5 - 6, 2006

As found through the Digi-States discussion list:

Northeast Document Conservation Center Presents:


December 5-6, 2006
The Marriott University Park Hotel
Tucson, Arizona

THE PERSISTENCE OF MEMORY conference, taught by leading experts in digital preservation, addresses the question of managing and preserving digital assets over the long term. Institutions are rapidly acquiring collections of digitized and born-digital resources. Without intervention, these materials will not survive even a single human career. This conference will highlight evolving best practices for digital preservation and will help institutions take the next steps to preserve their investment.

Conference cost: $325

Registration deadline: Friday, November 10, 2006

To encourage diverse participation in the Persistence of Memory conference, NEDCC is pleased to offer a limited number of scholarships that will cover the registration fee. In awarding the scholarships, particular attention will be given to diverse professionals, applicants from under-funded organizations, and applicants from organizations that serve under-represented communities. The scholarship application deadline is November 1, 2006.

Visit NEDCC's Web site for complete conference information, including scholarships:

This conference is co-sponsored by Amigos Library Services; Arizona State Library, Archives and Public Records; Balboa Art Conservation Center; and OCLC Western Service Center.

Partial funding of this conference is provided by the Institute of Museum and Library Services. NEDCC gratefully acknowledges support for its field service activities by the National Endowment for the Humanities.

Friday, September 15, 2006

Are there benefits to using Flickr to host a digital collection?

We all know of institutions that are augmenting their digital collections through Flickr. Flickr is easy to use and free for many users. But is Flickr a sustainable host for digital collections? Jeremiah Saunders is thinking about this topic and has done some preliminary research into it.
  • If your institution is using Flickr, how are you using it?
  • What do you rely on Flickr for?
  • Do you consider it transient information?
I'd be interested in hearing the answers to those questions as I prepare for a workshop for library staff members in October on social networking tools. I know how I see organizations use Flickr -- including PictureAustralia -- but would enjoy hearing more examples, including problems and concerns. I'm sure Saunders would also enjoy hearing from institutions that are incorporating Flickr into their digital collections.

Do you allow users to post comments in your digital collections?

That is the question being asked by digitizationblog and the SLAIS to CLA Conference blog. The two bloggers are teaming up to locate digital collections that allow commenting. As Jeremiah Saunders, of the SLAIS to CLA Conference blog, says:
...we wanted to compose a list of public institutions, particularly ones with digital collections, that offer a "add a comment" feature. If a picture is worth a 1,000 words, then a commenting feature in a digital collection may be priceless! Imagine how many visitors to a digital collection may have a story to tell about an image.
So if you allow people to leave public comments on your digital collections, let these bloggers know by either blogging about it or going here and leaving a comment. And if you have learned important lessons from allowing comments, please be sure to let them know.

Thursday, September 14, 2006

Report from US-UK Digital Preservation Workshop

Michael Day (of UKOLN) and Helen Hockx-Yu (of JISC) have produced a report from the invitational digital preservation workshop held in Washington, DC during May 2006. According to Day, the final version of this report should appear later this year in the first issue of the International Journal of Digital Curation. A draft version of the report is available at in both HTML and PDF formats.

Touring digitization facilities

When you get into digitizing materials, you often wonder what other facilities look like and how to they function. The smartest thing to do is to visit facilities -- commercial as well as facilities in similar organizations as your own. What will you learn?
  • What type of digitization do they do? How has that impacted their selection of equipment?
  • How is the area laid out? What needs -- human or equipment -- did they take into account? (Consider environmental factors.)
  • How is the facility staffed? Even if you don't ask this question, the layout and equipment will help you intuit the answer.
  • What additional features did they find important to include? Security system? Waterless fire suppression system? Detailed instructions mounted on the walls for workers?
If you can't visit other sites, then arrange a time to talk to a few by telephone and visit their web sites (which may have good information).

The one lesson you will learn is that not all digitization facilities are the same. Each has different equipment and services, based on the needs of their projects/clientele. That seems like something that you should not have to learn. It should be obvious. But it likely doesn't "hit home" until you see the differences for yourself.

A benefit from doing this is also that you may find resources that you will want to use. Don't want to do microfilm scanning, for example? You may find a facility -- perhaps in a sister insitution -- that will do it for you.

By the way, the Metropolitan NY Library Council is hosting two site visits this fall. One at the New York Botanical Garden's Digital Imaging Center and the other at the Frick Art Reference Library's Digital Imaging. Go to METRO's web site for more information.

Technorati tag:

Wednesday, September 13, 2006

Digitizing materials from President Anwar El-Sadat

We all know that the president of Egypt is Hosni Mubarak. Mubarak has been president since 1981. But who was president before him and what did that person accomplish? That was President Anwar El-Sadat who took great measures to create peace between Egypt and Israel. One of the collections being digitized by the Bibliotheca Alexandrina is that related to Sadat. According to the web site:
An agreement has already been finalized with President SadatÂ’s family to digitize the collection and negotiations are currently taking place with newspaper agencies, museums and others to receive their collections. A team is being formed for the indexing of the pictures collection and another team to sort and evaluate documents and other information resources available. A workflow has been designed and is being tested for digitization of the pictures, documents, audio and video recordings and other information resources, associating metadata with each item, and incorporating the output into DAR [Digital Asset Repository].
And President Sadat's family is partnering with the Bibliotheca Alexandrina and International School of Information Science (ISIS) on this project.

From the site, one can assume that this project is "in progress." With the 25th anniversary of Sadat's assassinationoccurringg this Oct. 6, I hope they will be able to release some of the digitized materials online. Wouldn't it be interesting -- and likely enlightening -- to read his notes from the peace negotiations he was involved in?

The Bibliotheca Alexandrina is engaged in a number of projects. Some of the projects already have their own web sites, while others do not. What seems to be missing from those that are not yet online is a clear indication of "where they are" in the process and when something will be available. Those clues would help us, their potential users.

Technorati tag:

Tuesday, September 12, 2006

Event: Access 2006 Conference, Ottawa, Oct. 11 - 14, 2006

This Canadian library technology conference includes a "Panel on National Digital Initiatives." The speakers/topics will be:
  • Toward a Canadian Digital Information Strategy - Susan Haigh, Library and Archives Canada
  • Canada's scientific infostructure: toward universal, seamless and permanent access to information for Canadian research and innovation. - Lucie Molgat, National Research Council, Canada Institute of Scientific and Technial Information
  • Canadian Initiative on Digital Libraries - Bill Maes, University Librarian, Dalhousie University
  • Alouette Canada - John Teskey, Director of Libraries, University of New Brunswick
There will also be talks on digital libraries in Europe, digital projects in Quebec, and one talk will be on DSpace.

There are several informative and entertaining speakers including Paul Miller of Talis. Although Paul does not speak on digitization, if you are attending this conference, I would highly recommend going to his talk. I guarantee you will walk away with useful information and a broadened perspective.

More information on this conference is available here.

Blog Post: Announcing Tesseract OCR

In August, Google released Tesseract OCR into open source. As the Google Code Blog says:
This particular OCR engine, called Tesseract, was in fact not originally developed at Google! It was developed at Hewlett Packard Laboratories between 1985 and 1995. In 1995 it was one of the top 3 performers at the OCR accuracy contest organized by University of Nevada in Las Vegas. However, shortly thereafter, HP decided to get out of the OCR business and Tesseract has been collecting dust in an HP warehouse ever since. Fortunately some of our esteemed HP colleagues realized a year or two ago that rather than sit on this engine, it would be better for the world if they brought it back to life by open sourcing it, with the help of the Information Science Research Institute at UNLV. UNLV was happy to oblige, but they in turn asked for our help in fixing a few bugs that had crept in since 1995 (ever heard of bit rot?)... We tracked down the most obvious ones and decided a couple of months ago that Tesseract OCR was stable enough to be re-released as open source.
There are many projects that could use good OCR software. Although this is not as good as a commercial product (by Google's admission), this may be very useful to projects that cannot afford commercial software. The Tesseract OCR can be downloaded here.

Since the announcement, there has been an update to the product released to fix a couple of problems. Looking at the web site, there may be other problems that need to be fixed. Like other open source products, I'm sure it will be the community of users who will support this. Will Google provide any ongoing support? From what I can see, the answer is "no," so let's hope that a community does form around this product.

Technorati tag:

Monday, September 11, 2006

Blog post: They didn't get the memo

Seth Godin, who is known for his ideas on marketing, has written a blog post about the adoption of technology. The post gives several statistics including:
  • 31.4% of Americans don't have internet access.
  • 59% of American households have zero iPods in them.
  • 30% of internet users in the US use a modem.
His point is that "all the growth and the opportunity and the fun is at the leading edge, at the place where change happens." mmm...if you look at his post, you -- like me -- might come up with a different conclusion. What I see is that there is a lot of technology is use, but the adoption rate is not as high as we are being led to believe.

For example, 31.4% don't have Internet access (I'll assume that they mean "at home"). What does that mean to the work we're doing to digitize materials and make them available online? Does this mean that we should ensure that people have access to the materials at PCs and kiosks in various locations and not rely on them being able to use them at home?

How should we think about images size and web page design if 30% of Internet users are still connecting by modem? And if we have some users surfing the Internet on their cell phones and PDAs, how do we built our sites to ensure that all of our constituents are served well? What happens if we ignore those on the fringes (those will slow access as well as those on the bleeding edge who are accessing from small screens)?

I think his post suggests many interesting questions. I hope people will not ignore the fact that they need to be answered.

Friday, September 08, 2006

Blog Post: Google and Michigan block access outside U.S.

In the English corner of this German blog (Archivalia) is a post about Google and the University of Michigan not allowing access to the books they are digitizing to people located outside of the U.S. This blogger has asked friends/colleagues in various countries to check this out and all reported not being able to access a specific "test" book in the UMich-Google collection. The error message they are seeing claims that there are copyright restrictions, yet they are trying to view a public domain works. Very strange.

If you're outside of the U.S., you might want to read this post and give some input. (And leave a message here too.) When you search Google Book Search for Emanuel Geibel's Gedichte (the test book they are using), what do you see?

Technorati tag:

Thursday, September 07, 2006

Image coordinates

In the contract between California Digital Library (CDL) and Google, there is mention on page 5 of the "image coordinates." On page 10, it is noted that CDL cannot sell or make available to others the image coordinates. In hearing and reading about ALTO (Analyzed Layout and Text Object), I now can better describe to myself and others what CDL cannot give to others. It cannot give to others information that describes the layout and content of the pages that have been digitized. When one searches OCR'd text, it is the image coordinates that tell the search engine where the word you searched is located in the image.

Yesterday, in seeing a demo of a particular digital content management system, I wondered if the image coordinates being built would be transferrable. Even if the image coordinates were built by that system, could they be moved to another content management system? Or would they need to be rebuilt? It was a technical question that I did not asked, given the non-techies in the room. Perhaps the answer is an easy one (and maybe I'm concerned about nothing). Something for me to follow-up on...

Tuesday, September 05, 2006

"Build your work around a key question."

In talking about writing books, Roy Peter Clark gives 50 tips. One of the tips is:
Build your work around a key question.
Could we say the same about digital collections? Is one way -- to ensure a collections importance and use -- to build the collection around a key question? Doing so would help in material selection, since every item would be related to that key question. It would help in marketing the collection, since you would be marketing it to people who need to figure out the answer to that question. And it would help users know why they should be using the collection.

Building a collection about a key question would do one more thing: it would ensure that the project was worth doing. In evaluating the potential project, consider:
  • Can you verbalize the question?
  • Do you understand who would be interested in the answer?
  • Do the materials help to explain the question in a way that will be useful to those seeking an answer?
  • Are these materials needed to help answer the question? (Or can the question be answered without these materials?)
I know that Clark was not thinking about digital collections when he wrote his list. I find it interesting, though, that at least one tip is useful in thinking about our projects.

Technorati tag:

Monday, September 04, 2006

Labor Day -- The unofficial end of summer in the U.S.

Most U.S. blogs will be quiet today. Labor Day is our last summer holiday. It not only marks the unofficial end of summer, but heralds the start of the school year.

Labor Day started as a holiday in the 1882. In 1884, its celebration moved to the first Monday of the month and has stayed on that day ever since. You can read the history of Labor Day on the U.S. Department of Labor web site.

However, in many countries Labour Day -- or the International Workers' Day -- is celebrated on May 1. This holiday also traces its history to the U.S. and the Haymarket Riot of 1886 in Chicago. To read about that Labour Day, go to this Wikipedia site.

Thanks to our efforts to digitize history -- whether that means retyping information or digitizing old articles and photos -- it is easy to share information on the two Labor Days and hopefully see both as worth celebrating.

Have a good holiday!

Friday, September 01, 2006

Sharing the e-book "The Master Key System"

I downloaded (legally) and have begun to read The Master Key System, a book published in 1916 and now being offered by The second page of the book states the following:

The original text is now in the public domain.

However, this free e-book edition is not in the public domain. It cannot
be shared, distributed or reproduced in whole or in part.

If you would like to share this e-book with others please direct
them to our web site
where a legitimate copy can be downloaded for free.

What is different from this e-book than the original?
  • It has been digitized
  • It has a new front cover
  • There is a new preface
Indeed the people from have created a derivative work. In fact, they did what many digitization projects do -- took something in that is in the public domain and made it available to a new audience by creating an electronic copy. Having done that work, they ask that people register (free) with them to receive a copy, rather than getting a copy from someone else. However, they've done nothing to prevent people from sharing copies. And no parts of the e-book contain a copyright statement (although copyright on the preface would be implied).

I wish the people from had thought more about how they were sharing the book. For example:
  • Why should people go to their web site to obtain a copy? If you want me to go to your web site to do something, tell me why there is a benefit to doing so. Why should I take the extra steps? What's in it for me?
  • Why shouldn't people share their copies? What's in it for them if they don't share the e-book?
  • Can you make it easy for people to share a link or recommend the e-book?
  • What does it mean that this e-book isn't in the public domain? Does that mean that it is copyrighted (which they never state)? Can this be explained better so it makes sense to users?
If you have read this far, think about the things on your web site -- perhaps as part of your digitization program -- that you want people to use.
  • Have you made it clear what they can do with the materials and how?
  • Have you made using the materials easy?
  • If you don't want people to share the materials, have you explained that in a way that makes sense to them? Have you given them easy options for sharing pointers or links to the materials?
We tend to shy away from the explicit, yet it clears up doubts when we say what we really mean. And it helps those around us when we not only say what we mean, but then give them tools to easy comply with our wishes. (I'm thinking of how easy it is to e-mail blog post link from Bloglines.) If we do those things, our users will thank us.