Friday, July 15, 2005

The use of checksums in digitization projects

The digitizationblog has had two postings on this recently. In the first posting, Mark Jordan writes:
Many institutions use checksums to ensure that the files they are creating in digitization projects are integral. However, there appears to be little consistency in what types of checksums are being used.
At the end of the article, Jordan asks that reading tell him about their use of checksum.

The second posting points to the release of "the DSpace Checksum Checker, which monitors changes in digital objects in DSpace by comparing initial and recent checksums and reporting the results via email."

What is checksum? Wikipedia states:
A checksum is a form of redundancy check, a very simple measure for protecting the integrity of data by detecting errors in data that is sent through space (telecommunications) or time (storage). It works by adding up the basic components of a message, typically the bytes, and storing the resulting value. Later, anyone can perform the same operation on the data, compare the result to the authentic checksum, and (assuming that the sums match) conclude that the message was probably not corrupted.

