Optimizing OCR Accuracy on Older Documents: A Study of Scan Mode, File Enhancement, and Software Products, by Jon M. Booth and Jeremy Gelb, Revised June 2006 (v.2)This document is technical and not an overview of OCR, so it is not for everyone. The conclusions, though, are interesting:
What is amazing is that they did achieve 98-99% without -- seemingly -- much fuss.
In conclusion, the combination of these facts demonstrate that file enhancement is not needed, because the recognition rates are already at an acceptable level, and more importantly, it does not improve the character recognition rates for OCR.
- Older and discolored documents must be scanned in RGB mode to capture all the image data, and to maximize OCR accuracy.
- The character accuracy produced by scanning older documents in RGB mode meets (GPO’s meeting of the experts) 99% OCR accuracy requirement, even without applying file enhancement.
- No single type of file enhancement, applied individually, improves character recognition rates forOCR.
- Specifically, the Downsampling enhancement type does not improve character recognition rates, despite OCR software manufacturers’ claims that a 300 dpi is optimal for recognition rates.
Technorati tag: OCR