In the May/June issue of Educause Review, Liz Liddy -- Trustee Professor in the School of Information Studies at Syracuse University and Director of its Center for Natural Language Processing -- talks about metadata creation and the ability to do it automatically. Her team has developed MetaExtract, which automatically generates and assigns metadata to electronic documents. Liz talks about using it on lesson plans, for example, and said that it performed well. (Which, knowing her, I believe.)
When creating metadata for Rochester Images, the cataloguers found that it could be quite time consuming. Part of the problem was the research needed to ensure that the right terms were assigned. Some of the information, however, was in the documents so using an automatic metadata generator would have helped by leaving to the cataloguers those things that could not be automatically assigned.
Books that are being digitized could benefit from this, as well as lab notebooks and other documents that corporations, etc., are digitizing. It brings to mind some interesting possibilities how this technology could be integrated into digitization systems.
I've known Liz since 1993 and worked with her for several years when we were both associated with MNIS/TextWise and its efforts to commercial DR-LINK (an natural language processing retrieval system). Whatever application of natural language processing (NLP) Liz puts her mind to, she masters. Hopefully this is one that we'll see integrated into digital asset management software.