Monday, October 03, 2011

What we learned from 5 million books (video)

The research in this 14 minute TED video was made possible by Google's book digitization efforts. The two researchers present this serious information in a fun way and demonstrate that reading isn't the only thing these digitized books are good for.

2 comments:

Martin Locock said...

It hints at the potential, but there is a fatal flaw: the chronological data for books scanned by Google is of very low reliability, based mainly on machine picking out dates from OCR on the preliminary pages of each volume. This results in numerous anomalies when dealing with reprints, new editions, and books with dates in the title, which are wrongly assigned to a date. Which goes to show that having good quality metadata prior to scanning is essential to producing digitised content which can be used as the basis for linguistic research.

Discount Jerseys said...

This results in numerous anomalies when dealing with reprints, new editions, and books with dates in the title, which are wrongly assigned to a date. Which goes to show that having good quality metadata prior to scanning is essential to producing digitised content which can be used as the basis for linguistic research.