Paper
18 December 2001 N-gram language models for document image decoding
Gary E. Kopec, Maya R. Said, Kris Popat
Author Affiliations +
Proceedings Volume 4670, Document Recognition and Retrieval IX; (2001) https://doi.org/10.1117/12.450728
Event: Electronic Imaging, 2002, San Jose, California, United States
Abstract
This paper explores the problem of incorporating linguistic constraints into document image decoding, a communication theory approach to document recognition. Probabilistic character n-grams (n=2--5) are used in a two-pass strategy where the decoder first uses a very weak language model to generate a lattice of candidate output strings. These are then re-scored in the second pass using the full language model. Experimental results based on both synthesized and scanned data show that this approach is capable of improving the error rate by a factor of two to ten depending on the quality of the data and the details of the language model used.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Gary E. Kopec, Maya R. Said, and Kris Popat "N-gram language models for document image decoding", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); https://doi.org/10.1117/12.450728
Lens.org Logo
CITATIONS
Cited by 12 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Communication theory

Back to Top