Paper
29 January 2007 A multi-evidence, multi-engine OCR system
Ilya Zavorin, Eugene Borovikov, Anna Borovikov, Luis Hernandez, Kristen Summers, Mark Turner
Author Affiliations +
Proceedings Volume 6500, Document Recognition and Retrieval XIV; 650005 (2007) https://doi.org/10.1117/12.703106
Event: Electronic Imaging 2007, 2007, San Jose, CA, United States
Abstract
Although modern OCR technology is capable of handling a wide variety of document images, there is no single OCR engine that performs equally well on all documents for a given single language script. Naturally, each OCR engine has its strengths and weaknesses, and therefore different engines tend to differ in the accuracy on different documents, and in the errors on the same document image. While the idea of using multiple OCR engines to boost output accuracy is not new, most of the existing systems do not go beyond variations on majority voting. While this approach may work well in many cases, it has limitations, especially when OCR technology used to process a given script has not yet fully matured. Our goal is to develop a system called MEMOE (for "Multi-Evidence Multi-OCR-Engine") that combines, in an optimal or near-optimal way, output streams of one or more OCR engines together with various types of evidence extracted from these streams as well as from original document images, to produce output of higher quality than that of the individual OCR engines, or of majority voting applied to multiple OCR output streams. Furthermore, we aim to improve the accuracy of OCR output on images that might otherwise have low accuracy that significantly impacts downstream processing. The MEMOE system functions as an OCR engine taking document images and some configuration parameters as input and producing a single output text stream. In this paper, we describe the design of the system, various evidence types and how they are incorporated into MEMOE in the form of filters. Results of initial tests that involve two corpora of Arabic documents show that, even in its initial configuration, the system is superior to a voting algorithm and that even more improvement may be achieved by incorporating additional evidence types into the system.
© (2007) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ilya Zavorin, Eugene Borovikov, Anna Borovikov, Luis Hernandez, Kristen Summers, and Mark Turner "A multi-evidence, multi-engine OCR system", Proc. SPIE 6500, Document Recognition and Retrieval XIV, 650005 (29 January 2007); https://doi.org/10.1117/12.703106
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Associative arrays

Image processing

Analytical research

Matrices

Computing systems

Imaging systems

Back to Top