Digitization 101: Talking across time and space

Tuesday, March 06, 2007

Talking across time and space

At the iPRES conference in October, Ian Wilson, the Librarian and Archivist of Canada, talked about libraries and archives giving people the ability to talk across time -- from one century to the next, from one millennium to the next. When we go to a library and read a book written in the 1700s about life in that time, we are hearing from people of that era. Although artifacts are very important, their words help to build context.

How do we access those materials and find the pieces of information that will be meaningful to us? If the materials are digitized and either transcribed or OCR'd, then we can search their full-text in order to find salient tidbits. If transcriptions or OCR'd texts are not available, then we rely on the indexing that has been done. This indexing -- or metadata -- can point us in the right direction. It may not point us to the correct chapter and verse, but it should get us to the correct book.

Metadata can be tedious to create, especially when we don't know what will be important to the next generation. How will people in 5, 50 or 100 years look for these items? What will they be looking for? And so we spend time including keywords and others terms (fields) that we hope will give users the search and retrieval options that they will desire.

The problem is that the words we use today to describe many things are not the words we used yesterday, and will not be the words we use tomorrow. We also know that some terms are regional (e.g., sub, hoagie, hero sandwich) . And so metadata may need a thesaurus or some ability for "see also" to that people are pointed in the right direction. There are systems being built that can automatically create online thesauri (as CNLP might call "Automatic Knowledge Organization Structure Construction"). For the metadata to last and to help us have those conversations across time, we will need automated thesauri or systems that will automatically update our metadata with the newest words for those things we have described. Without those systems in place, we will have to rely on people to do the "translations" for us. "What are you looking for? Oh, it use to be called..."

Even the transcribed and OCR'd text may need to be accessed through a search engine that has a thesaurus for the same reasons. Passages will be unfindable if we don't know what modern words are used to describe those "ancient" terms.

Most of us don't think about thesaurus creation. That is a task for someone else. But it could be that thesaurus creation, like metadata creation, will be a necessary component to future information access projects. And it something that perhaps we need to be thinking about now.

Technorati tag:

metadata,

thesaurus

Tuesday, March 06, 2007

Talking across time and space

No comments: