Digitization 101: Taxonomies

Thursday, May 18, 2006

Taxonomies

Last night I listened to the audio of two presentation given at Computers in Libraries (CIL) on taxonomies -- Portal Taxonomy: A Case Study of MediaSleuth (Marjorie M.K. Hlava) and Taxonomy Tales (Jennifer Evert).

Jennifer Evert spoke about using people to help create a robust -- yet precious -- taxonomy for use in indexing and retrieval of content at LexisNexis. One thing that stood out to me is that the software tools also help the human indexers apply the terms correctly. She spoke of "editorial drift" which is when indexers do not apply the terms consistently. Although we don't use that phrase, editorial drift is something digitization projects need to be aware of when creating metadata. Terms must be apply correctly and consistently.

Marjorie Hlava (Margie) is the President, Chairman, and founder of Access Innovations, Inc. The MediaSleuth web site says that the company "is a division of NICEM (National Information Center for Educational Media) and was developed in conjunction with Access Innovations in response to market conditions and requests from both sides of the educational and training media community." Part of what Margie talked about was using machine aided indexers (MAI). Quoting from her slides (in the collected presentations book):

M.A.I. suggests the correct terms from the taxonomy as descriptors
M.A.I. rulebase recognizes term equivalents

In other words, MAI can help to index materials more quickly and more accurately, once the rules have been created. Of course, those rules do require human input and humans are needed to help to keep the rules up-to-date.

When most libraries think of creating metadata, they think of doing it manually. As we our need to create metadata increases, we need to look at tools that will help us do it faster and smarter...and help us guard against editorial drift. Tools like those developed by LexisNexis and Access Innovations might be things that we would use.

In talking with Margie after CIL, I learned that Access Innovations does scanning and OCR as a way of helping their clients load content into databases. This is not their main focus, but it reminded me of how many companies have gotten involved in digitization. In this case, as a way of helping their clients and maintaining their client-base.

Technorati tag: CIL2006, taxonomy

2 comments:

Anonymous said...: Are those two presentations available on CIL website ? I did not find them.

Thanks.

Romain; 11:01 AM
Jill Hurst-Wahl said...: Romain, you're right. They are not on the CIL web site.

CIL audiotaped every presentation and then collected the presentatons for the CDs that they sold. The information on the CIL web site was collected separately. The web site says "Links are provided solely at the discretion of presenters." Obviously some presenters decided not to put their materials on the CIL web site. However, the audio and PDF versions of the handouts are on the CDs for every (I think every) presentation.; 11:15 AM