Wednesday, April 11, 2007


Google is helping to develop OCRopus. The Google press release about OCRopus is here. The web site describes it as:

...a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.

The web site goes onto say:

The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.

OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.

An alpha release of the product is scheduled for the third quarter of this year, so it looks like our benefiting from this may be a "ways off." However, it is good to see a major company working on this open source product.

