Friday, September 01, 2017

Are you digtizing what is true?

1940 Census publicity photo
1940 Census publicity photo
We - the global we - are digitizing our history, including birth, death, marriage, census and other records for a vast number of people.  Ancestry.com looks at these records and uses OCR and algorithms to make sense of them.  However, there are problems.  Records from the late 1800s and early 1900s are handwritten, which can make them difficult to interpret.  Using the information about the age of the person at the census leads to a guess about the year that person was born, and the guess has a 50% chance of being correct.  Then there is the problem of names and if the name is correct. 100 years ago, people knew who each other were and didn't care if the name was misspelled, or if the name was just wrong.  However, now all of these potential errors are causing problems.

We cannot go through every line of data that is being digitized, compare it to other data, and then correct it.  While the data would be more accurate, the process would be too time-consuming and costly.  Ancestry.com (and I'm sure other sites) allow people to compile information and make corrections on their "copy."  This is a wonderful solution, if the person knows the data is wrong, but what if the person has no idea?

This topic came to mind because I'm researching my family tree and the data isn't always close to being accurate. Thankfully, I know enough about the family tree to be able to make intelligence decisions about the data I'm using (or so I hope).  But I cannot go in and correct what I know is blatantly wrong and that is frustrating.

If you are digitizing material today and making it available, or even archiving born digital materials:
  • How do you know that the information is accurate?  
  • What do you need to tell people about the data, which might help them understand its potential lack of accuracy?  
  • Can you build-in a feedback mechanism that would allow people to provide corrections?
Site of Steinway Hall, W. 57th (LOC)
Site of Steinway Hall, W. 57th
Yes, I know people are thinking about this.  I also know that people are creating systems that do allow for user-generated comments, descriptions, and tagging.  People are also doing this on the Internet in places like Flickr.  You see this, for example, with the historic photos that have been uploaded by the Library of Congress.  If you check the photo on the right, you'll see interesting and useful comments. Can we do more of this?



No comments: