...a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
The web site goes onto say:
An alpha release of the product is scheduled for the third quarter of this year, so it looks like our benefiting from this may be a "ways off." However, it is good to see a major company working on this open source product.
The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.
OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.
Technorati tags: Google, OCR