In this paper, we present our new method for the segmentation of handwritten text pages into lines, which has been submitted to ICDAR'2013 handwritten segmentation competition. This method is based on two levels of perception of the image: a rough perception based on a blurred image, and a precise perception based on the presence of connected components. The combination of those two levels of perception enables to deal with the difficulties of handwritten text segmentation: curvature, irregular slope and overlapping strokes. Thus, the analysis of the blurred image is efficient in images with high density of text, whereas the use of connected components enables to connect the text lines in the pages with low text density. The combination of those two kinds of data is implemented with a grammatical description, which enables to externalize the knowledge linked to the page model. The page model contains a strategy of analysis that can be associated to an applicative goal. Indeed, the text line segmentation is linked to the kind of data that is analysed: homogeneous text pages, separated text blocks or unconstrained text. This method obtained a recognition rate of more than 98% on last ICDAR'2013 competition.
The analysis of 2D structured documents often requires localizing data inside of a document during the recognition process. In this paper we present LearnPos a new generic tool, independent of any document recognition system. LearnPos models and evaluates positioning from a learning set of documents. Thanks to LearnPos, the user is helped to define the physical structure of the document. He then can concentrate his efforts on the definition of the logical structure of the documents. LearnPos is able to furnish spatial information for both absolute and relative spatial relations, in interaction with the user. Our method can handle spatial relations compose of distinct zones and is able to furnish appropriate order and point of view to minimize errors. We prove that resulting models can be successfully used for structured document recognition, while reducing the manual exploration of the data set of documents.
Boosting over decision-stumps proved its efficiency in Natural Language Processing essentially with symbolic features, and its good properties (fast, few and not critical parameters, not sensitive to over-fitting) could be of great interest in the numeric world of pixel images. In this article we investigated the use of boosting over small decision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, we conducted experiments to compare it to usual SVM-based classification revealing convincing results with very close performance, but with faster predictions and behaving far less as a black-box. Those promising results tend to make use of this classifier in more complex recognition tasks like multiclass problems.
The transcription of handwritten words remains a still challenging and difficult task. When processing full
pages, approaches are limited by the trade-off between automatic recognition errors and the tedious aspect of
human user verification. In this article, we present our investigations to improve the capabilities of an automatic
recognizer, so as to be able to reject unknown words (not to take wrong decisions) while correctly rejecting (i.e.
to recognize as much as possible from the lexicon of known words).
This is the active research topic of developing a verification system that optimize the trade-off between
performance and reliability. To minimize the recognition errors, a verification system is usually used to accept
or reject the hypotheses produced by an existing recognition system. Thus, we re-use our novel verification
architecture1 here: the recognition hypotheses are re-scored by a set of support vector machines, and validated
by a verification mechanism based on multiple rejection thresholds. In order to tune these (class-dependent)
rejection thresholds, an algorithm based on dynamic programming has been proposed which focus on maximizing
the recognition rate for a given error rate.
Experiments have been carried out on the RIMES database in three steps. The first two showed that this
approach results in a performance superior or equal to other state-of-the-art rejection methods. We focus here on
the third one showing that this verification system also greatly improves results of keywords extraction in a set
of handwritten words, with a strong robustness to lexicon size variations (21 lexicons have been tested from 167
entries up to 5,600 entries) which is particularly relevant to our application context cooperating with humans,
and only made possible thanks to the rejection ability of this proposed system. The proposed verification system,
compared to a HMM with simple rejection, improves on average the recognition rate by 57% (resp. 33% and
21%) for a given error rate of 1% (resp. 5% and 10%).
Document analysis and recognition systems often fail to produce results with a sufficient quality level when processing
old and damaged documents sets, and require manual corrections to improve results. This paper presents
how, using the iterative analysis of document pages we recently proposed, we can implement a spontaneous
interaction model, suitable for mass document processing. It enables human operators to detect and correct
errors made by the automatic system, and reintegrates the corrections they made into subsequent analysis steps
of the iterative analysis process. Thus, a page analyzer can reprocess erroneous parts and those which depend
on them, avoiding the necessity to manually fix during post-processing all the consequences of errors made by
the automatic system. After presenting the global system architecture and a prototype implementation of our
proposal, we show that document model can be simply enriched to enable the spontaneous interaction model we
propose. We present how to use it in a practical example to correct under-segmentation issues during the localization
of numbers in documents from the 18th century. Evaluations we conducted on the example case show, on
50 pages containing 1637 numbers to localize, that the interaction model we propose can reduce human workload
(29.8% less elements to provide) for a given target quality level when compared to a manual post-processing.
This paper presents a new method to address the problem of handwritten text segmentation into text lines
and words. Thus, we propose a method based on the cooperation among points of view that enables the
localization of the text lines in a low resolution image, and then to associate the pixels at a higher level of
resolution. Thanks to the combination of levels of vision, we can detect overlapping characters and re-segment
the connected components during the analysis. Then, we propose a segmentation of lines into words based on the
cooperation among digital data and symbolic knowledge. The digital data are obtained from distances inside a
Delaunay graph, which gives a precise distance between connected components, at the pixel level. We introduce
structural rules in order to take into account some generic knowledge about the organization of a text page.
This cooperation among information gives a bigger power of expression and ensures the global coherence of the
recognition. We validate this work using the metrics and the database proposed for the segmentation contest of
ICDAR 2009. Thus, we show that our method obtains very interesting results, compared to the other methods
of the literature. More precisely, we are able to deal with slope and curvature, overlapping text lines and varied
kinds of writings, which are the main difficulties met by the other methods.
This paper presents an improvement to a document layout analysis system, offering a possible solution to Sayre's paradox ("a letter must be recognized before it can be segmented; and it must be segmented before it can be recognized"). This improvement, based on stochastic parsing, allows integration of statistical information, obtained from recognizers, during syntactic layout analysis. We present how this fusion of numeric and symbolic information in a feedback loop can be applied to syntactic methods to simplify document description. To limit combinatorial explosion during exploration of solutions, we devised an operator that allows optional activation of the stochastic parsing mechanism. Our evaluation on 1250 handwritten business letters shows this method allows the improvement of global recognition scores.
KEYWORDS: Visualization, Visual process modeling, Image segmentation, Human vision and color perception, Retina, Image resolution, Document image analysis, Feature extraction, Chemical elements, Cobalt
This work addresses the problem of document image analysis, and more particularly the topic of document
structure recognition in old, damaged and handwritten document. The goal of this paper is to present the interest
of the human perceptive vision for document analysis. We focus on two aspects of the model of perceptive vision:
the perceptive cycle and the visual attention. We present the key elements of the perceptive vision that can be
used for document analysis.
Thus, we introduce the perceptive vision in an existing method for document structure recognition, which
enable both to show how we used the properties of the perceptive vision and to compare the results obtained
with and without perceptive vision. We apply our method for the analysis of several kinds of documents (archive
registers, old newspapers, incoming mails . . . ) and show that the perceptive vision significantly improves their
recognition. Moreover, the use of the perceptive vision simplifies the description of complex documents. At last,
the running time is often reduced.
Collections of documents are sets of heterogeneous documents, like a specific ancient book series, having proper
structural and semantic properties linking them. A particular collection contains document images with specific
physical layouts, like text pages or full-page illustrations, appearing in a specific order. Its contents, like journal
articles, may be shared by several pages, not necessary following, producing strong dependencies between pages
interpretations. In order to build an analysis system which can bring contextual information from the collection
to the appropriate recognition modules for each page, we propose to express the structural and the semantic
properties of a collection with a definite clause grammar. This is made possible by representing collections as
streams of document images, and by using extensions to the formalism we present here. We are then able to
automatically generate a parser dedicated to a collection. Beside allowing structural variations and complex
information flows, we also show that this approach enables the design of analysis stages, on a document or
a set of documents. The interest of context usage is illustrated with several examples and their appropriate
formalization in this framework.
This paper presents a system to extract the logical structure of handwritten mail documents. It consists in two
joined tasks: the segmentation of documents into blocks and the labeling of such blocks. The main considered
label classes are: addressee details, sender details, date, subject, text body, signature. This work has to face
with difficulties of unconstrained handwritten documents: variable structure and writing.
We propose a method based on a geometric analysis of the arrangement of elements in the document. We
give a description of the document using a two-dimension grammatical formalism, which makes it possible to
easily introduce knowledge on mail into a generic parser. Our grammatical parser is LL(k), which means several
combinations are tried before extracting the good one. The main interest of this approach is that we can deal
with low structured documents. Moreover, as the segmentation into blocks often depends on the associated
classes, our method is able to retry a different segmentation until labeling succeeds.
We validated this method in the context of the French national project RIMES, which proposed a contest on
a large base of documents. We obtain a recognition rate of 91.7% on 1150 images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.