In this paper, we present a method to extract the text lines in poorly structured documents. The text lines may have
different orientations, considerably curved shapes, and there are possibly a few wide inter-word gaps in a text line. Those
text lines can be found in posters, blocks of addresses, artistic documents. Our method is an expansion of the traditional
perceptual grouping. We develop novel solutions to overcome the problems of insufficient seed points and varied
orientations in a single line. In this paper, we assume that text lines consists of connected components, in which each
connected components is a set of black pixels within a letter, or some touched letters. In our scheme, the connected
components closer than an iteratively incremented threshold will be combined to make chains of connected components.
Elongate chains are identified as the seed chains of lines. Then the seed chains are extended to the left and the right
regarding the local orientations. The local orientations will be reevaluated at each side of the chains when it is extended.
By this process, all text lines are finally constructed. The advantage of the proposed method over prior works in
extraction of curved text lines is that this method can both deal with more than a specific language and extract text lines
containing some wide inter-word gaps. The proposed method is good for extraction of the considerably curved text lines
from logos and slogans in our experiment; 98% and 94% for the straight-line extraction and the curved-line extraction,
respectively.
We propose a system that detects the current speaker in multi-speaker videoconferencing by using lip motion. First, the system detects the face and lip region of each of the candidate speakers using face color and shape information. Then, to detect the current speaker, it calculates the change between the current frame and the previous frame in lip region. To close-up the detected current speaker, we used two CCD cameras. One is a general CCD camera, the other is a PTZ camera controlled by RS-232C serial port. The experimental result is the proposed system capable of detecting the face of current speaker in a video feed with more than three people, regardless of orientation of the faces. With this system, it only takes 4 to 5 seconds to zoom in on the speaker from the initial reference image. Also, it is a more efficient image transmission system for such things as video conferencing and internet broadcasting because it offers a close up face image at a resolution of 320x240, while at the same time providing a whole background image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.