Friday, March 30, 2007

Event: 3rd International Digital Curation Conference

From the sigdl-l discussion list.

The UK Digital Curation Centre (DCC), the US National Science Foundation (NSF) and the Coalition for Networked Information (CNI) are pleased to jointly announce the 3rd International Digital Curation Conference to be held on Wednesday 12th - Thursday 13th December 2007 at the Renaissance Washington Hotel in Washington DC, USA.

Entitled "Curating our Digital Scientific Heritage: a Global Collaborative Challenge" the conference will focus on emerging strategy, policy implementation, leading-edge research and practitioner experience, and will comprise a mix of peer-reviewed papers, invited presentations and keynote international speakers.

Further details and a Call for Papers will be published shortly at

The event will follow on from the Fall 2007 CNI Task Force meeting which will be held on Monday 10th - Tuesday 11th December, also at the Renaissance Washington Hotel, Washington DC.

More information about the DCC can be found at

Thursday, March 29, 2007

Interview with Brewster Kahle

There is a 12-minute audio interview with Brewster Kahle on the Chronicle to Higher Education web site. The web site teaser says:
Brewster Kahle, director of the nonprofit Internet Archive and leader of the Open Content Alliance, a large-scale book-scanning project, outlines his vision for digital libraries.
Kahle begins by talking about the difference between his project (OCA) and the one being done by Google. The Open Content Alliance is digitizing 12,000 books per month in the U.S. They are doing full color scanning with OCR at 10 cents per page or an estimated $30 per book, according to what Kahle says in the interview. He believes this cost makes digitizing books more feasible for libraries.

As he later points out, digitizing one million books would cost $30 million. That would create a digital library that is larger than many town libraries. He says the library system in the U.S. is a $12 billion/year industry, so this cost would be less than 0.30% of the budget for one year. (He doesn't say where get got the $12 billion figure nor how he defines the term "library system".)

With more people relying on the Internet for information, he argues that getting books more findable online is important. (And who would argue against that?)

As always, it is interesting to hear what is on his mind. I find the $30/book figure to be quite intriguing. In December 2004, during his speech at the Library of Congress, he said that books could be digitized at a cost of $10 per book using a robotic scanner. Since then, he has developed the Scribe book scanner, which is a high-quality manual book scanner. I don't know what scanner and software is being used by the Open Content Alliance, but it would be informative to know what changed to go from $10 per book to $30 per book.

Technorati tags: , ,

Event: METS meetings in Göttingen, Germany

As posted on the DIGITAL-PRESERVATION discussion list.

Please note that the dates, programs, and registration are now set and open for the METS --
Metadata Encoding and Transmission Standard -- events planned for May 4 - 8, 2007 at the Goettingen State and University Library in Germany. Hosted by the nestor Project in association with the Digital Library Federation, the METS events are being organized by the METS Editorial Board.

Friday, May 4, 2007, the METS Board is planning its spring Board meeting to which the public is invited. A draft agenda can be found on the METS wiki at: Suggestions for agenda items can be added to the page or sent to the METS listserv. To join the METS listserv, see

Monday, May 7, 2007, a METS Opening Day training event is scheduled. The full program, travel information, and requested registration can be found on the Nestor Project web site at:
The METS Opening Day event is targeted to persons desiring an introduction and overview of METS.

Tuesday, May 8, 2007, a METS Implementors' Meeting (MIM) is scheduled with full program and registration details found at the above Nestor site. The MIM event is aimed at participants who have a basic knowledge of METS and who are interested in knowing more and/or discussing technical details related to implementation of METS in their digital library or institutional repository environment. General topics for the MIM include METS Profiles, METSbeans, a METS API for Java, and METS tools. More specific questions, topics, introductions to tools, and profiles are being solicited from participants.
If you've got something to bring to or discuss at the MIM, please sign up on the METS wiki at:

There are no charges for any of the METS events thanks to the sponsorships of the Nestor Project and the Digital Library Federation members.

Questions about any of the events can be directed to the METS listserv or to the following contacts:
  • METS Editorial Board meeting: Nancy Hoebelheinrich, []
  • METS Opening Day: Markus Enders []
  • METS Implementors' Meeting: Brian Tingle []

Technorati tag:

Wednesday, March 28, 2007

Testimony given by the Librarian of Congress

On March 20, the Librarian of Congress -- Dr. James H. Billington -- gave testimony before the
U.S. House of Representatives Subcommittee on the Legislative Branch. Here are some highlights from his testimony (which is 7 pages in length): [my emphasis added]
  • It took two centuries for the Library of Congress to acquire today's analog collection—32 million printed volumes, 12.5 million photographs, 59.5 million manuscripts and other materials – a total of more than 134 million physical items. By contrast, with the explosion of digital information, it now takes only about 15 minutes for the world to produce an equivalent amount of information. Researchers at Cal-Berkeley produced estimates of the amount of information produced and circulated on the Internet in 2003 – it was equivalent to 37,000 times the content of one Library of Congress. Most of this information exists only in digital form: so-called born-digital items, many of which are already irretrievably lost.
  • The average life of a Web site has been estimated to be 44 to 75 days, and information not actively preserved today could literally be gone tomorrow.
  • To cite some examples: of a sample set of 56 primary sources identified by the Congressional Research Service to support research on Hurricane Katrina in 2005 and 2006, 21% were no longer available on the Web in 2007. Web sites relating to the national elections of 1994-the first time the Web played a role in such elections—have also vanished. It was not until 2000 that the Library began preserving election Web sites. Political scholars wishing to write the history of how the Web has influenced politics will have to do so without important pieces of the puzzle.
  • In late 1994, we launched a program to digitize 5 million items of American history and culture for educational purposes—the National Digital Library. The budget was $60 million with a 3-to-1 private match for every dollar of congressional support. By the end of the 1990s, the Library had well over 5 million items of American history on-line. We have continued this process and now have more than 11 million items on our American Memory Web site for educational use by teachers and librarians. The Library has benefited from the support of the Ad Council to promote the Library's educational and literacy programs. Our overall Web usage climbs continuously and now stands at more than 5 billion electronic hits each year.
  • Through NDIIPP [National Digital Information Infrastructure and Preservation Program], the Library has built a national network of 67 partners to collect, save and provide access to a body of high-quality research and educational content in digital form. We have been working closely with content providers, technology innovators, libraries, archives, and end-users to advance the science and practice of preserving important at risk materials that are perishable and often exist only in digital form... The Library now manages a total of 295 terabytes of digital content, including 66 terabytes of digital material preserved by our partners across the nation.
  • Just as the Library has acquired, preserved and made accessible more than 134 million traditional analog items (books, manuscripts, maps, music and movies), we are now applying the skills and values of traditional librarianship to the digital world. I have been told by members of Congress and their staff that if they want information, they simply find it on Google, and you can indeed find a flood of information on Google – sometimes hundreds, and even thousands, of sources for a single query. Our goal is to integrate the best available electronic information into the knowledge, judgment and wisdom contained in books and in the minds of our curators so that Congress and the American people continue getting the same authentic, reliable information and knowledge that have been the hallmark of the Library since its inception in 1800. [Comment: Even with the Library of Congress nearby, members of Congress are relying on Google and other search engines for their information needs!]
  • The scope of our digital strategy encompasses every aspect of the Library and envisions our playing a central role for the nation in three ways: (1) digitizing and distributing online for educational purposes primary materials from the Library of Congress and other repositories, (2) gathering and preserving in the Library and other cooperating institutions important digital material produced elsewhere and in danger of disappearing for use by Congress and the nation, and (3) converting as many of the Library's processes and products into electronic and digital forms as possible.
There is also impressive information in his testimony about the new National Audiovisual Conservation Center (NAVCC) located in Culpeper, VA. This indeed will be a state-of-the-art facility.

At the local level, we are not always cognizant of what the Library of Congress is doing. Therefore it is good that Dr. Billington's testimony is available online. Hopefully many people will at least skim it.

I am always hoping that those who are learning (those who are running ahead of the pack and blazing new or different trails) will give presentations are conferences and write articles. So...those of you at the Library of Congress...please make your rounds and visit the library conferences. Talk about what you are doing. Show us pictures. Bring us up to speed. We're all ready to learn from you.

Thanks to M.R.Weaver, one of my students, for pointing out this testimony. I noted in our class discussion that those who are job seekers should note this quote from the testimony:
We have shared with Congress some of our ongoing efforts to ensure the professional development of our staff, training, mentoring, and performance planning and evaluation. We have a large number of staff who are retirement-eligible, and we will have to hire many new staff with specialized skills that are often hard to find.

Technorati tags: ,

Tuesday, March 27, 2007

Learning about digitization -- follow-up

On March 14, I did a blog post entitled "Learning about digitization." Mal Booth, from Australia, posted a very pertinent comment. Yes, workshops can be too generic.

I received an e-mail from someone who wondered why some don't find workshops useful. The person also pondered about having the commercial community (vendors) presenting sessions or workshops at library conferences on digitization. Below is my response.

I think the problems with the workshops are:
  • Some workshops are too structured and not flexible enough to address specific participant concerns.
  • People take the wrong workshops. I remember an old story of someone taking a workshop where the focus was on how to digitize photos, but his project was to digitize newspapers. What he learned didn't automatically translate into his situation.
  • People take workshops that are convenient for them to attend and may not have the budget to attend a workshop elsewhere that may be more on target with their needs. (Not everyone can afford to go to the School for Scanning.)
  • Some concepts are difficult to teach in a workshop and are best learned doing hands-on.
  • Some people just learn better outside of the workshop environment.
As for associations getting the commercial community to share its experiences, the associations might be open to the commercial community approaching them to do conference sessions OR workshops at their conferences. Or perhaps even demos. But I don't think the associations will approach the commercial community, rather I think the commercial community needs to contact the associations with firm ideas of what they could do (teach) to help association members.

You're correct, the associations will want the commercial community to not try to sell product, but to educate. Unfortunately, if one company goes into sales-mode at the wrong time, it will ruin it for everyone.

It would be wonderful if a commercial entity did conference sessions (and articles) along with their clients AND talked honestly about their projects. I know from the discussion lists, as well as e-mails and blog comments, that things are not always rosy. Companies need to understand that talking honestly about the lessons they -- or their clients -- learn during a project would be valuable lessons for the rest of us. We've grown to like transparency.

I'm prepping now for a digitization workshop at Computers in Libraries, so this topic is weighing heavy on my mind.

Technorati tags: ,

International Digital Preservation Systems Survey

In case you haven't seen this from Karim and Sally at the Getty. The deadline for participating is this Friday, March 30.

We invite you to participate in the online International Digital Preservation Systems Survey:

This survey is intended to provide an overview of digital preservation system (DPS) implementation. DPS is defined here as an assembly of computer hardware, software and policies equivalent to a TDR (trusted digital repository) “whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now, and in the future”[1].

The survey was produced by the Getty Research Institute departments of Digital Resource Management and Library Information Systems, and will be distributed primarily among members of the Digital Library Federation (DLF). Results will be shared at the DLF Spring Forum, April 23-25, 2007 (Pasadena, California, USA) , and with all respondents who provide contact information.

Please respond by March 30, 2007.

Thank you in advance for your participation.


Karim Boughida ( & Sally Hubbard
Getty Research Institute
1200 Getty Center Drive, Suite 1100
Los Angeles CA 90049-1688 USA
Voice: Tel. (1) 310 440-7335
Fax: (1) 310-440-7780

[1] RLG. 2002. Trusted Digital Repositories: Attributes and Responsibilities.
Mountain View, Calif.: RLG, Inc.

Monday, March 26, 2007

The cost of digitization

One of the shockers for my students has been the cost of a digitization program. First, they didn't realize everything that needed to be done in a digitization program (which affects the cost). Second, they wrongly assumed that the digitization itself would cost the most (which it doesn't). Many of them -- as part of the interview assignment they did earlier in the semester -- learned a bit about cost. Next week, as part of the lecture, they will learn more.

As a culture, we don't like to talk about how much things cost. We don't want to brag about bargains and seem too frugal. We don't want to talk about cost, if we think we've spent too much. In fact, it is the fear of finding out that we've spent too much that probably keeps us from talking about the costs of things. What if someone else spent less or got a better deal?

We are also inhibited from talking openly about prices sometimes by our vendors. If their clients are prohibited to talk about the contracts they sign, then future clients don't know if they are paying more than current clients. Prices can stay high (and go higher).

When can we talk about costs? When we're dealing with commodities. Commodities are those things -- like oil, oranges, and hogs -- that are bought and sold based on price, with the assumption that the quality across vendors (producers) is the same. Digitization is not yet a commodity business. It is not like buying an orange.

From what I can see, there are very few article written about the cost -- the real cost -- of digitization. Here are three that I have come across:

Technorati tag:

Friday, March 23, 2007

Library 2.0 social network in Ning

Earlier this month, Bill Drew began a social network on Ning focused on Library 2.o. The network is here. The network has quickly grown to more than 700 members. If you're into Library 2.0, you may want to check out this network. There are also links to other L2 networks on Ning.

Technorati tag:

How I want to view digitized materials


Yesterday I did the keynote presentation at the New York State Educational Media/Technology Association spring conference and I spoke on Second Life. Whenever I speak on Second Life, I always try to talk about why Second Life (SL) is important and what we might learn from it. SL in some ways is a huge sandbox. It allows people to think out-of-the-box about spaces, products and services. SL will undoubtedly impact tools that we will use in the future.

One way in which I hope SL impacts us is in how we view digitized materials. Above is a picture of the Alzheimer's Society of Ontario (Canada) exhibit that was mounted last fall in Second Life. As you walked into the exhibit, you could see the photographs and other pieces the same way you would if you were in a real gallery. For me, it was an amazing experience. Suddenly I understood what was missing with displays of digitized materials and that was the ability to truly see them in a digital exhibit space.

At a committee meeting this week, we were looking at how images were appearing in a pilot system, talking about how to tweak the page's look and feel, and how people would interact with some of the materials. However, the systems we are all dealing with are not meant to give us -- as an option -- the ability to create virtual galleries. Wouldn't it be cool if you could select a set of images that could be displayed in a virtual gallery, then be able to walk about that gallery? Even better, wouldn't it be cool if your users could select on-the-fly what items they wanted to view in a virtual gallery and then be able to walk around that gallery online?

Am I nuts to want this type of functionality? I don't think so. (If I am, please let me know.) I think if we want to give people the same experience that they would get if they went to a local history collection, museum or archive, then we need something like this.

P.S. -- If this topic sounds remotely familiar, I did talk about this in one of my 2006 year-end posts.

Technorati tags: ,

Thursday, March 22, 2007

Humor: Complexity of digitization mathematically expressed

This, from Tom Blake, is a humorous but realistic formula for calculating the cost of digitizing a given quantity of print materials.

Smiley courtesy of www.FreeSmileys.orgGo ahead...have a chuckle!

Tuesday, March 20, 2007

Event: MetaArchive Distributed Digital Preservation Workshop

EVENT: MetaArchive Distributed Digital Preservation Workshop

FOCUS: Provides information and training for institutions that seek to build or join distributed digital preservation networks based on the LOCKSS software.

DATES: May 30-June 1, 2007

LOCATION: Emory University, Atlanta, GA


AUDIENCE: Administrators and System Administrators from institutions seeking to build or join distributed digital preservation networks.

DESCRIPTION: Geared toward teams of administrators and system administrators, this workshop will provide information and specialized training for institutions seeking to build or join distributed digital preservation networks based on the LOCKSS software.

On the first day of the workshop, we will present key strategies for establishing a secure LOCKSS-based digital archive to teams of administrators and system administrators. The second and third days will train system administrators to implement a network and/or an individual node.

REGISTRATION: Participants may register for the entire three day workshop, or only the first day. Registration is limited to the first 60 people to register. The Workshop registration fee is $150 for one day and $350 for three days. The fee includes breakfast, lunch, and an afternoon break for each day of the workshop. Transportation and accommodation costs are not covered by this registration fee.

Day One
(for teams of administrators and system administrators)

The first day of the workshop will help attendees to develop and strengthen their plans for distributed digital preservation. Presentations and discussions will address the organizational, legal, technical, and financial elements of digital preservation. Workshop instructors will share with attendees the policies and procedures utilized in MetaArchive, including its metadata conspectus tool, its Cooperative Charter document, and its Deposit Agreement template. During the day, workshop attendees will work with instructors to develop digital preservation plans that suit their specific implementation contexts.

Topics to be covered include:

* Network context
* Digital content selection processes
* Metadata capture
* Intellectual property issues
* Content migration
* Organizing a network (models)
* Partnership management tools

Days Two and Three
(for system administrators)

The second and third days of the workshop are designed for system administrators. Workshop instructors will address the technical implementation of a distributed digital preservation network. Attendees will learn how to produce and manage private LOCKSS networks.

Topics to be covered include:

* LOCKSS installation
* Plug-in creation
* Content preparation
* Metadata management
* Monitoring the cache
* Recovering a node

HOSTS: The MetaArchive Cooperative is an independent, multi-state membership association whose purpose is to support, promote, and extend the MetaArchive approach to distributed digital preservation practices. ( This approach relies upon a distributed preservation network infrastructure that is based on the LOCKSS software ( The MetaArchive Cooperative began as a collaborative venture of Emory University, Auburn University, Florida State University, Georgia Institute of Technology, University of Louisville, Virginia Polytechnic Institute and State University, and the Library of Congress. The initiative began in 2004 as part of the National Digital Information Infrastructure and Preservation Program (NDIIPP) supported by the Library of Congress. The MetaArchive Cooperative is responsible for maintaining and extending its methodology and approach to distributed digital preservation, as well as the specific MetaArchive networks that it hosts (e.g., the MetaArchive of
Southern Digital Culture).

INTERESTED IN JOINING METAARCHIVE? For more information, please contact Katherine Skinner at

Katherine Skinner, Ph.D.
Digital Projects Librarian
Robert W. Woodruff Library, Emory University
540 Asbury Circle, Atlanta GA 30322
404 783-2534,

Technorati tag:

Copyright and Fair Use Guidelines for Teachers

The California Student Media Festival has a chart on its web site entitled "Copyright and Fair Use Guidelines for Teachers." Given the organizations behind this Festival -- which include KOCE-TV, Foothills College Center for Innovation, and California School Library Association -- I think the information on the chart has been examined well. I would feel comfortable pointing teachers to this as a guideline, and then to the law for additional information.

In the chart, they talk about what you can do and then the fine print (including what you can't do). Why do teachers need this? Because the creators of the content haven't told them directly. When we create sites of materials that people will want to use, it would be helpful to them if we stated this information upfront, rather than having people (1) contact us with questions or (2) doing what they want, which may not be what we want. If you don't have terms and conditions (or terms of use) on your web site, why not discuss it at a staff meeting and begin to draft something? You might even look at what others have written (for example) and use them as examples.

The same chart is also available on the North Carolina Conference of English Instructors web site. There they explain that was created by Hall Davidson, executive director of educational services and telecommunications at KOCE-TV in California. A "pretty" version of the chart is available here.

Technorati tag:

Friday, March 16, 2007

Article: The Digital Ice Age

Published in the December 2006 issue of Popular Mechanics, this article tells us what we already know -- digital content can easily become unreadable for a number of reasons. What may be news for many, though, is that digital content can change because the systems that access it have changed. The article begins with this example:
When the aircraft carrier USS Nimitz takes to sea, it carries more than a half-million files with diagrams of the propulsion, electrical and other systems critical to operation. Because this is the 21st century, these are not unwieldy paper scrolls of engineering drawings, but digital files on the ship's computers. The shift to digital technology, which enables Navy engineers anywhere in the world to access the diagrams, makes maintenance and repair more efficient. In theory. Several years ago, the Navy noticed a problem when older files were opened on newer versions of computer-aided design (CAD) software.

"We would open up these drawings and be like, 'Wow, this doesn't look exactly like the drawing did before,'" says Brad Cumming, head of the aircraft carrier planning yard division at Norfolk Navy Shipyard.

The changes were subtle — a dotted line instead of dashes or minor dimension changes — but significant enough to worry the Navy's engineers. Even the tiniest discrepancy might be mission critical on a ship powered by two nuclear reactors and carrying up to 85 aircraft.
So even if you keep the files successfully for years, will they still mean the same as they did originally?

Imagine that history is re-written not because people decide to change the facts, but because details are lost as files become unreadable or file contents change when the software "reads" the file differently.

Technorati tag:

Wednesday, March 14, 2007

Learning about digitization

We envision that people who are involved in digitization programs have attend the best possible workshops and conferences in preparation for the work they are doing. Every year, my graduate students interview people for an assignment, who are involved in digitization programs. What they find is that many people learned how to digitize "on-the-job". That learning was supplemented with reading, workshops and perhaps some conferences. But it is in doing that people pick up the most useful knowledge. I know -- that is not shocking. Shocking though are some comments about the usefulness (or lack thereof) of the workshops people attended.

As someone who gives workshops, I'm wondering what I and others can do to make our workshops more relevant to those involved in digitization programs. Is the difficulty that every program is different and a workshop may not touch deeply on a topic that is relevant to a specific program? Is it that hands-on experience can be impossible to build into some workshops? Or is it that by the time people attend workshops, they already know the information that the workshop is going to cover?

If you have thoughts on this topic, I'd like to hear them (as would others who give digitization workshops). If you've learned about digitization in varies ways, what method was most useful to you? What would have made the workshops more useful?

If you don't want to leave a comment here, you can e-mail me at hurst {at} hurstassociates {dot} com . If you don't want to tell me at all, consider telling your local library consortium which is likely planning some digitization workshops and could use the input.

Technorati tag:

Tuesday, March 13, 2007

Texas Heritage Online: A federated search system

We try to learn from others, but often other projects and programs don't want to share information about how they did it. This could be due to a need for confidentiality, a belief that they don't have anything to share, a desire to hide their mistakes, or just not enough time to put the information online. Therefore, I was very pleased to see the information available for the Texas Heritage Online, "a federated search portal for Texas libraries, archives, and museums with digital collections of cultural heritage materials." Here is a one-page document that gives a technical overview of their federated search software. The information is easy to read and only uses necessary jargon.

Thanks to whomever spearheaded the effort to put this information online. I'm sure that others besides myself appreciate it.

Technorati tag:

Copyright blogs

The list of blogs that I monitor is constantly changing. Today I added another copyright blog to the list. Here are all of the copyright blogs I monitor currently:
Why? When we think about copyright and digitization, the questions, ideas, trends, and problems will not all be raised by the same people or in the same place. For example, thinking about copyright in terms of a library's digitization efforts for course reserves will occur in one place, while thinking about the impact of copyright on digitizing and share our culture will occur someplace else.

You'll notice that I used the word "monitor" above. I can't read everything! (If I did, I'd get no work done.) So I skim and then read those things that are truly important to what I'm doing or thinking. And likely you don't have time to read everything. So remember that this stuff is stored on the Internet and you can use a search engine, if necessary, to find it again.

Technorati tag:

Sunday, March 11, 2007

Article: History, Digitized (and Abridged)

The New York Times published a long article on digitization, with an emphasis on those things that are not being digitized potentially being ignored because they are not available online. The problem? A lack of funding (which should be no surprise to any of us).
At the Library of Congress, for example, despite continuing and ambitious digitization efforts, perhaps only 10 percent of the 132 million objects held will be digitized in the foreseeable future. For one thing, costs are prohibitive. Scanning alone on smaller items ranges from $6 to $9 for a 35-millimeter slide, to $7 to $11 a page for presidential papers, to $12 to $25 for poster-size pieces. (The cost of scanning an object can be a relatively minor part of the entire expense of digitizing and making an item accessible online.)
Later the author writes:
Even with outside help, experts say, entire swaths of political and cultural history are in danger of being forgotten by new generations of amateur researchers and serious scholars.
Is there a solution? As I read the article, I thought of crowdsourcing and indeed -- although not called that -- there is an example later in the text:

...genealogy experts affiliated with the Church of Jesus Christ of Latter-day Saints are fanning out, digital cameras in hand, making copies of genealogically relevant records in 200 cities around the world, including New Orleans. Over the next five years, the church expects to have hundreds of millions of digital images available.

Mr. Metcalfe said economies of scale helped his organization bring down the cost of capturing each image to roughly 20 cents — far less than what a commercial company might charge.

Now there may be image quality issues with how this work is being done, but it is increasing access to the information!

Of course, there are other issues besides the the cost of digitization. The article acknowledges that there can be copyright concerns that need to be addressed. However, even if the copyright concerns could be swept away, would we have the resources to digitize and make available all of the analog/hardcopy content that should be digitized? Notice I said digitize AND make available. The cost of digitization is only part of the problem. Would we have the resources to create the metadata, do the transcriptions, create the access systems, and all the other work to ensure that the items could be found AND understood?

Should we begin to emphasis more the creation and use of digital finding aids as a way of making people aware of what is available at an institution? I know...for those who want to use the primary source material online, this is not a solution. However, it would build awareness of materials. Those who could then could travel to see the items, or they work with the institution to view them online. Digitizing finding aids could be an important tool that many are overlooking. Everyone wants to digitize the item and forgets that digitizing the pointers to that item can be just as valuable.

Technorati tags: ,

Friday, March 09, 2007

OCLC: Paper to Screen—a FREE webcast

OCLC is holding a two-part webcast during March on digitizing newspapers. The webcasts will be on March 15 and March 27. You can register (free) at Below is information from the OCLC web site:

March 15, 2007, 3 p.m. EST

Choosing the Right Route: Preparing Newspapers for Online Access
OCLC preservation experts will explain the activities that lead to successful newspaper digitization projects. You’ll see the path those projects take and hear the key steps you can follow to ensure similar results.

Our guest speaker, David Roberts, Wissahickon Valley Public Library Director, will share his journey through the newspaper digitization process and offer lessons he learned along the way.

March 27, 2007, 3 p.m. EST

Delivering and Managing an Online Newspaper Collection
Ron Gardner, OCLC Digital Services Consultant, will discuss how to deliver a first-class user experience with CONTENTdm—from displaying your newspaper collection to enhancing, storing and managing it.

Technorati tag:

Thursday, March 08, 2007

Blog post: JPEG2000 for digital preservation

Peter Murray has written in his blog, Disruptive Library Technology Jester, about the use of JPEG2000 for digital preservation. Murray sees JPEG2000 as a suitable replacement for TIFF and offers up five reasons why he believes this to be so. His fifth reason is:
The JPEG2000 is an open standard with defined and emerging protocols for guaranteeing compliance with the standard.
Later he says:
One of the concerns about JPEG2000 is some language from the JPEG2000 website about how “undeclared and obscure submarine patents may still present a hazard…” to open use of the standard. This seems like lawyer CYA to me as nothing has come up that I’m aware of in the seven years after the standard was ratified...And, if in the end it is found that a patent would cause an embargo ‘unlicensed’ versions of JPEG2000 codecs for some period of years, we can always run a batch conversion back to TIFF until the embargo period is up and/or something else better comes along.
More programs are using JPEG2000, including the Princeton University Library. On their web site, it states:

For the most part, we'll be deriving JPEG2000 images from the master TIFF files. JPEG2000 is a recently-developed imaging standard that is based on wavelet technology. Wavelets allow a great deal of end user functionality (like zooming, panning, etc.), while retaining small file sizes and little loss from a great deal of compression. JPEG2000 has its own security model, and can allow for metadata to be stored internally to the image, both of which may prove very valuable for retaining intellectual property rights over materials in the coming years.

JPEG2000s are still not viewable in most browsers, so we've acquired the Aware JPEG2000 server software for dynamically displaying JPEG2000s as JPEGs, while still retaining the same flexible end user interaction and tools.

In thinking about storing metadata in a JPEG2000 file, Murray says that he would store the authoritative version of the metadata there. This would provide a backup of the metadata with the actual images.

Technorati tag:

Wednesday, March 07, 2007

Metadata is more than just cataloguing

Someone left a comment on a recent post reminding me that metadata is more than just cataloguing. Very true. Describing the item is only part of the entire metadata record. For those, though, that don't understand metadata, the word 'cataloguing' turns on a few lights. So I need to remember that metadata is like cataloguing, but you also will want to place additional information in the metadata that are not part of a cataloguing record.

The comment was a useful reminder. Thanks to the person who left it.

BTW librarians come in two forms -- cataloguing and non-cataloguing. I'm a librarian who doesn't like the detail work of cataloguing (or of metadata creation). I want to help define the task and its components, ensure that the fields/descriptions are appropriate, and then let someone else do it. I know there are librarians who derive tremendous joy from each record they create and I'm glad they exist!

Technorati tag:

Tuesday, March 06, 2007

Talking across time and space

At the iPRES conference in October, Ian Wilson, the Librarian and Archivist of Canada, talked about libraries and archives giving people the ability to talk across time -- from one century to the next, from one millennium to the next. When we go to a library and read a book written in the 1700s about life in that time, we are hearing from people of that era. Although artifacts are very important, their words help to build context.

How do we access those materials and find the pieces of information that will be meaningful to us? If the materials are digitized and either transcribed or OCR'd, then we can search their full-text in order to find salient tidbits. If transcriptions or OCR'd texts are not available, then we rely on the indexing that has been done. This indexing -- or metadata -- can point us in the right direction. It may not point us to the correct chapter and verse, but it should get us to the correct book.

Metadata can be tedious to create, especially when we don't know what will be important to the next generation. How will people in 5, 50 or 100 years look for these items? What will they be looking for? And so we spend time including keywords and others terms (fields) that we hope will give users the search and retrieval options that they will desire.

The problem is that the words we use today to describe many things are not the words we used yesterday, and will not be the words we use tomorrow. We also know that some terms are regional (e.g., sub, hoagie, hero sandwich) . And so metadata may need a thesaurus or some ability for "see also" to that people are pointed in the right direction. There are systems being built that can automatically create online thesauri (as CNLP might call "Automatic Knowledge Organization Structure Construction"). For the metadata to last and to help us have those conversations across time, we will need automated thesauri or systems that will automatically update our metadata with the newest words for those things we have described. Without those systems in place, we will have to rely on people to do the "translations" for us. "What are you looking for? Oh, it use to be called..."

Even the transcribed and OCR'd text may need to be accessed through a search engine that has a thesaurus for the same reasons. Passages will be unfindable if we don't know what modern words are used to describe those "ancient" terms.

Most of us don't think about thesaurus creation. That is a task for someone else. But it could be that thesaurus creation, like metadata creation, will be a necessary component to future information access projects. And it something that perhaps we need to be thinking about now.

Technorati tag: ,

Sunday, March 04, 2007

Event: The 2nd International Conference on Digital Information Management (ICDIM'07)

This event will be held October 28-31, 2007 in Lyon, France. Topics include Digital Libraries, Information Management, Information Retrieval and much more. For complete details, please go to the web site.

Friday, March 02, 2007

Collaborative Digitization Program to Merge into Bibliographical Center for Research

The press release will be of interest to many people. It looks like a real win-win.

BCR News Release
For Immediate Release
Contact: Sharon Hoffhines - (303) 751-6277 x102,

March 1, 2007

CDP to Merge into BCR

Aurora, Colo. - The Collaborative Digitization Program (CDP) and the Bibliographical Center for Research (BCR) today announced their intention for the CDP to merge into BCR, effective April 1. Both organizations provide services to libraries and other cultural heritage organizations throughout a multistate area focused in the West. The digital services programmatic offerings of the merged organizations will be known as CDP@BCR.

The merger of the CDP - with its nationally recognized digitization expertise - into BCR - a multistate library cooperative serving more than 1,100 member libraries - will enable BCR to build new service programs in the area of digitization and reach out to cultural heritage organizations in addition to bringing CDP-based training, best practices and guidelines, and consulting services to member libraries.

Nancy Allen, Chair of the CDP Board and Principal Investigator on most of the CDP-related grants, said, "This is a wonderful opportunity to move CDP to a new level of leadership and service quality, combining its expertise with that of BCR and reaching out to all of BCR's members as well as offering BCR technical capacity and expertise to CDP's cultural heritage community partners and members."

"BCR has an excellent reputation for providing quality training to library staff throughout our 11-state region," commented Brenda Bailey-Hainer, Executive Director of BCR. "The merger of CDP into BCR will allow us to expand our offerings into the area of digital collections using the tested curriculum of CDP and to build on the strengths of both organizations and their staffs. In addition, CDP's past success in securing federal grant funding for cutting edge digital
projects will assist BCR with moving into this arena."

CDP will move from its current offices at the University of Denver to BCR, also based in Colorado, as of April 2, and the CDP Board will dissolve its nonprofit corporation to merge into BCR, also a
not-for-profit corporation. Leigh Grinstead, currently CDP Operations Coordinator, will transition to BCR in a newly created position, CDP Program Coordinator, providing continuity to CDP members and partners. The current Web site material will be merged with the
BCR Web site over time. A Digital Services Advisory Group will be formed to advise CDP@BCR on current and new digital services and initiatives. CDP Working Groups will continue to meet to create and update best practices and standards and to assist with future planning for digital
initiatives and grant proposals.

For additional information, contact Brenda Bailey-Hainer, the Executive Director of BCR, at or (303) 751.6277 or Leigh Grinstead, at or (303) 871.2006 (before April 1).

The Bibliographical Center for Research (BCR) is a nonprofit, multistate library cooperative that has served the library community since its founding in 1935, providing cost-effective library and information services. Today BCR ( serves libraries in 42 states, Canada and Guam, many through statewide agreements with state library agencies in 11 western states allowing all libraries in those states to use BCR services as member institutions.

The Collaborative Digitization Program, established in 1999 as the Colorado Digitization Project, enables access to cultural, historical and scientific heritage collections of the West by building collaboration between archives, historical societies, libraries and museums. The CDP ( provides assistance to the cultural heritage community through best practice guidelines, workshops, and digitization grant funding. The CDP also delivers access to thousands of digital photographs, text, and sound files documenting the history, culture, and science of the west through Heritage West.

Thursday, March 01, 2007

Survey on long-term preservation is now online

Jan Hutar has sent out the following information. His words follow.....
National Library of the Czech Republic, on behalf of DPE project team, would like to ask you for cooperation to fill-in "Survey on long-term preservation" questionnaire. Similar survey has been done for national libraries, now we are looking for university libraries and research institutions to complete it. The questionnaire is available online

National Library of the Czech Republic is one of the partners of the European Digital Preservation Europe project. DigitalPreservationEurope (DPE) fosters collaboration and synergies between many existing national initiatives across the European Research Area. DPE addresses the need to improve coordination, cooperation and consistency in current activities to secure effective preservation of digital materials. You will find more information at the website of the project:

We would like to disseminate results that will be really useful for all the target groups. This is impossible without having feedback from them. Please, be so kind and fill in the online survey no later than 15th of March 2007. In any case you can contact me:

Mgr. Jan Hutar
National Library of the Czech Republic Klementinum 190 110 00 Prague 1
Czech Republic

My March schedule

Just a quick note to point out the calendar on the left side of Digitization 101 that shows the conferences and other events that I will be attending or speaking at in 2007. This month, I'll be speaking at the New York State Educational Media/Technology Association on "Second Life as New Media (and Its Impact on You)." Abstract:
Second Life is a three-dimensional, online digital world being created by its residents. Currently it is inhabited by more than 4 million people, with more joining each day. Also involved in Second Life is a growing list of academic institutions, libraries, not-for-profit organizations and businesses. This presentation will focus on the work being done by these groups in Second Life, why they find it an important place to be, and why you should be paying attention to their efforts.
By the way, you may wonder why you should be interested in Second Life (SL), since it has nothing to do with digitization. First, networking. You'd be amazed who you will talk to in Second Life. The networking is wonderful. Second, there are tools being built and experimented with in SL that may impact how digitized materials are displayed elsewhere on the Internet. Third, its fun (and we all need a little fun in our lives).