Thursday, July 05, 2012
Unified Digital Format Registry (UDFR)
The Unified Digital Format Registry (UDFR) is:
...a reliable, publicly accessible, and sustainable knowledge base of file format representation information for use by the digital preservation community.
A format is a set of semantic and syntactic rules governing the mapping between abstract information and its representation in digital form. While many worthwhile and necessary preservation activities can be performed on a digital asset without knowledge of its format, that is, merely as a sequence of bits, any higher-level preservation of the underlying information content must be performed in the context of the asset's format.
The UDFR seeks to "unify" the function and holdings of two existing registries, PRONOM and GDFR (the Global Digital Format Registry), in an open source, semantically enabled, and community supported platform.
The UDFR was developed by the University of California Curation Center (UC3) at the California Digital Library (CDL), funded by the Library of Congress as part of its National Digital Information Infrastructure Preservation Program (NDIIPP). The service is implemented on top of the OntoWiki semantic wiki and Virtuoso triple store.
According to an email from Stephen Abrams (Associate Director, UC Curation Center, California Digital Library), UDFR includes information about:
- 846 file formats
- 28 character encodings
- 17 compression algorithms
- 1,198 MIME types
- 548 external signatures (file extensions)
- 494 internal signatures (magic numbers)
- 268 software packages
- 156 agents
If you are involved at all in digital preservation, this is a site worth bookmarking.