Wednesday, June 28, 2006

Proposal for a regional digitization project, part 6

Yesterday instead of continuing with this "proposal", I responded to a comment made by Kevin Driedger. A comment received on that post was about preservation, which is the topic for today. In that comment, Richard L. Hess wrote:
The other crucial point is that in this century, I suspect we'll see the mantra, if it's not digitized, it is not preserved.

As you know, preservation can take many forms, but geographic diversity of duplicate originals is a key to long-term survivability of records and documents. That is only effectively possible in the digital domain.

...While many of these organizations do an excellent job in preservation, with only one copy, the risks are high. The museum could be struck by lightning or we could see record-breaking rains like those that the DC area just experienced (too close to Harrisburg and Lancaster -- my Dad's home -- for comfort).
Here Richard Hess is talking about preserving the content. Once an item is digitized, then you have a digital file that you can use instead of the original. That digital file preserves the content of the original and is a surrogate for the original. In some cases, it might even be better than the original (e.g., enhanced audio). It is easy to make copies of those digital files, so now you can have lots of copies so that those contents stay safe (the idea behind LOCKSS).

So, creating the digital files preserves the content, but then you must preserve those digital files. Many projects have been begun and completed without any thought to long-term preservation of the digital assets. Generally when we talk about preserving the digital assets, we simply talk about the concepts of refreshing and migrating the files. As simple as these ideas might be, procedures and processes must be put into place so that they are done. And that is not so easy or simple.

Using the glossary in the Western New York Regional Digitization Plan, which I worked on with a committee of WNYLRC members:
Refreshing is a technique used to preserve digital content. When files are refreshed, exact copies are made of them on newer media. This is done because of the concern that older media may have a limited shelf life (or may have already outlived its shelf life).

Migration is a technique used to preserve digital content. Migration entails the replacing older file formats and internal structures with newer ones. For example, a JPEG file might be migrated to a newer version of that format. The assumption is that the older version will eventually not be supported, so it s better to migrate files to the newer, supported formats.
Since refreshing is making copies of files onto new media, it actually is quite simple, but requires time and new media in order to do it. It is a task that can easily be procrastinated. However, if not done regularly, the project runs the risk of having its media get old and degrade, thus ruining the files stored on it. (If you have lots of copies, then hopefully a disaster has been mitigated.) The New York State Archives is recommending (in a yet to be published guideline) that organizations review their files every six months to determine if they need to be refreshed, rather than automatically refreshing the files on a regular schedule.

And when do you migrate? When file formats have changed and the new formats are stable and are being widely adopted. The key is to migrate before support runs out on the old formats. We'll assume that the migrate paths will be easy, but -- of course -- we really don't know that for sure.

Since preserving digital files is something that we are still trying to get our heads around, there continues to be work in this area. Questions being asked include:
  • Can we pinpoint better the "when"?
  • How much does digital preservation cost?
  • Must all digital files be preserved?
  • Must all digital files be preserved at the same level? (Here we get into the concepts of bit-level preservation and full preservation.)
Resources to read more about preservation include:
We often say that digitizing items will lessen the wear and tear on the originals. The thought is that the digital versions will be heavily used, and that the originals can go into storage. However, often more serious researchers become aware of the originals from their digital surrogates and then want to see/use the originals. So those originals may actually be used more. It thus becomes important that the originals be conserved and preserved, if at all possible. Generally, this work is funded out of a different "pot." Depending on the conditions of the originals, some conservation efforts might occur before the materials are digitized, and might be funded by the actual digitization program.

And so what about this mythical project? I would hope that they would think about digital preservation at the beginning of the project and plan for it, even if those plans are rough. I would, however, expect that they might enter into this project without really considering preservation and with the attitude that they will think about it "later." Like everyone else, then they will hope that "later" does not occur too late.

And what about preserving the originals? Since we understand how to conserve and preserve items that we find traditionally in libraries, museums and archives, I would expect that preservation of those items would not be a huge problem (especially for the collaborators of this project). The collaborators should be able to handle the work among themselves and either pay for it out of their operating budgets or get a grant specifically for preservation. In fact, as the collaborators talk about their strengths and what they can donate to the project, there might be one institution that could spearhead (or coordinate) any conservation/preservation efforts.

That's all for today. There is one more area to discuss: marketing. We'll tackle that later this week.

For the first parts of this series, read part 1, part 2, part 3, part 4, and part 5.

Technorati tags: , ,

1 comment:

Anonymous said...

Hello, Jill,

Great concerns and comments. I have not reviewed all of the links, but I do understand the concerns.

I apologize if my direction is media-centric and skewed towards audio/video records as opposed to paper.

Indeed, within not too many years, the bulk of the content of the audio-video records will only be accessible if it is migrated and refreshed.

To pick a minor nit, "migration" is also used in some circles to mean "refreshing" as defined in the WNYRDP, cited above.

I see migration as defined above to be a longer-term step than refreshing. One of the factors to make this manageable is to limit the acceptable formats in the archive and to keep those formats as open and generic as possible. On the AMIA list we were discussing whether DV converted to files was open enough for archiving.

A friend, Dr. Henry Gladney who authors the Digital Document Quarterly at refers to refreshing as just part of good data centre management. It has been addressed and while it is new to many, I think good data centre practice has been well studied. Banks rarely seem to lose data these days, for example.

Jill, I know we're not steering the same course here and looking at the same end purposes, but my excitement is in "tacking on" the digital preservation component to what you're discussing.

With media files (especially tape), we really have little choice but to digitize with the degradation that is happening in the carriers - it's making acidic paper look like a simple problem!

Where I get excited is that digital archives are very expensive to start and run, but they grow very gracefully. In other words, the cost to start up the facility, staff and manage it, and provide the first 100 TB of storage is X, but the next 100 TB of storage may be 0.1 X.

It's the threshold that local, small archives cannot properly ante up and that invites exceptionally creative, but often unsustainable solutions to digital storage.

My vision (and I must give others credit for helping me with it, I did not come to it alone) is that umbrella regional archives can offer this service to the smaller local archives, and do it properly and transparently. And the incremental cost would be low.

Yes, I'm probably making your project bigger than you wanted, but that's the nice part of putting it down in writing first. In an earlier discussion you made that very important point.

I do not see commercial digital repositories as a viable stand-alone model. The risk of deletion due to temporary funding interruption is too high.

I think coupling the regional repository with cultural/research/educational institutions is far safer.

For example, just because the Bird-In-Hand Historical Society cannot ante up their annual storage cost one year is no reason to delete their records. That's why I think the regional archive needs to have its own preservation mandate and why a simple commercial model doesn't work as well.

The option we've seen here at the University of Toronto with their D-Space implementation (called T-Space), is that there is a per-GB fee charged and that pays for storage in perpetuity. Of course, that is only open to the Univesity community.

I guess I'm saying that combining the access and the preservation appears cost effective to me.

On other minor nit. We would not suggest that the "enhanced" audio would be the preservation copy. It could be an enhanced access copy, however.