Paper
22 December 1999 Empirical performance evaluation of page segmentation algorithms
Author Affiliations +
Proceedings Volume 3967, Document Recognition and Retrieval VII; (1999) https://doi.org/10.1117/12.373507
Event: Electronic Imaging, 2000, San Jose, CA, United States
Abstract
Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) system. While numerous segmentation algorithms have been proposed, there is relatively less literature on comparative evaluation -- empirical or theoretical -- of these algorithms. We use the following five step methodology to quantitatively compare the performance of page segmentation algorithms: (1) First we create mutually exclusive training and test dataset with groundtruth, (2) we then select a meaningful and computable performance metric, (3) an optimization procedure is then used to automatically search for the optimal parameter values of the segmentation algorithms, (4) the segmentation algorithms are then evaluated on the test dataset, and finally (5) a statistical error analysis is performed to give the statistical significance of the experimental results. We apply this methodology to five segmentation algorithms, three of which are representative research algorithms and the rest two are well-known commercial products. The three research algorithms evaluated are: Nagy's X-Y cut, O'Gorman's Docstrum and Kise's Voronoi-diagram-based algorithm. The two commercial products evaluated are: Caere Corporation's segmentation algorithm and ScanSoft Corporation's segmentation algorithm. The evaluations are conducted on 978 images from the University of Washington III dataset. It is found that the performance of the Voronoi-based, Docstrum and Caere's segmentation algorithms are not significantly different from each other, but they are significantly better than ScanSoft's segmentation algorithm, which in turn is significantly better than the performance of the X-Y cut algorithm. Furthermore, we see that the commercial segmentation algorithms and research segmentation algorithms have comparable performances.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Song Mao and Tapas Kanungo "Empirical performance evaluation of page segmentation algorithms", Proc. SPIE 3967, Document Recognition and Retrieval VII, (22 December 1999); https://doi.org/10.1117/12.373507
Lens.org Logo
CITATIONS
Cited by 10 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Optical character recognition

Detection and tracking algorithms

Error analysis

Image processing algorithms and systems

Statistical analysis

Performance modeling

RELATED CONTENT

The OCRopus open source OCR system
Proceedings of SPIE (January 28 2008)
Stochastic modeling in image segmentation
Proceedings of SPIE (September 24 1998)
Evaluation and error detection in digital image segmentation
Proceedings of SPIE (December 09 1992)

Back to Top