KEYWORDS: Hough transforms, Time-frequency analysis, Picosecond phenomena, Image segmentation, Signal analyzers, Detection and tracking algorithms, Electronic imaging, Current controlled current source, Machine learning, Medicine
This paper presents a novel approach for the multi-oriented text line extraction from historical handwritten
Arabic documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image
paving algorithm that can progressively and locally determine the lines. The paving algorithm is initialized with
a small window and then its size is corrected by extension until enough lines and connected components were
found. We use the Snake for line extraction. Once the paving is established, the orientation is determined using
the Wigner-Ville distribution on the histogram projection profile. This local orientation is then enlarged to limit
the orientation in the neighborhood. Afterwards, the text lines are extracted locally in each zone basing on
the follow-up of the baselines and the proximity of connected components. Finally, the connected components
that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of
Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an
separation accuracy of about 98.6%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.