PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Matching pursuit, introduced by Mallat and Zhang, is an algorithm for decomposing a signal into a linear combination of functions chosen from possibly redundant dictionary of functions. A variant which we call quantized matching pursuit has been proposed for various lossy compression problems. Here a simple dependent coding scheme is introduced to code the coefficients and indices in a quantized matching pursuit representation. The improvement in rate-distortion performance is shown through simulations on synthetic sources. The resulting systems is used to code still images and motion-compensated video residual images. Since a DCT-basis dictionary is used, the multiplicative computational complexity is equal to that of traditional transform coding. The image coding results are ambiguous, with a very slight increase in PSNR but no discernible subjective improvement. The video coding results are more promising, with bit rate reductions of up to 20 percent comparing at constant SNR. The competitive performance and design flexibility indicate that the method warrants further investigation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We address efficient context modeling in arithmetic coding for wavelet image compression. Quantized highpass wavelet coefficients are first mapped into a binary source, followed by high order context modeling in arithmetic coding. A blending technique is used to combine results of context modeling of different orders into a single probability estimate. Experiments show that an arithmetic coder with efficient context modeling is capable of achieving a 10 percent bitrate saving over a zeroth order adaptive arithmetic coder in high performance wavelet image coders.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a new low-complexity entropy-coding method to be used for coding waveform signals. It is based on the combination of two schemes: (1) an alphabet partitioning method to reduce the complexity of the entropy-coding process; (2) a new recursive set partitioning entropy-coding process that achieves rates smaller than first order entropy even with fast Huffman adaptive codecs. Numerical results with its application for lossy and lossless image compression show the efficacy of the new method, comparable to the best known methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An algorithm for image compression, based on local histogram analysis, is presented. A given image is compressed by dividing the image into nonoverlapping square blocks and coding the edge information in each block. The edge information is extracted by first differentiating the original image, quantizing the differential image, then investigated the local histogram of small blocks of the differential image. Depending on the behavior of the local histograms in the differential image, the corresponding blocks in the original image are classified into visually active and visually continuous blocks. The visually continuous blocks are coded using the mean value only. A visually active block is coded using the location and orientation of the edge within the block. As a result, the compression ratio of the proposed algorithm depends on the behavior of the local histogram, which in turn depends heavily on the quantization process of the differential image. In this paper, the effect of the quantization of the differential image on the compression ratio and the image quality is discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the perceived inadequacy of current standards for lossless image compression, the JPEG committee of the International Standards Organization (ISO) has been developing a new standard. A baseline algorithm, called JPEG-LS, has already been completed and is awaiting approval by national bodies. The JPEG-LS baseline algorithm despite being simple is surprisingly efficient, and provides compression performance that is within a few percent of the best and more sophisticated techniques reported in the literature. Extensive experimentations performed by the authors seem to indicate that an overall improvement by more than 10 percent in compression performance will be difficult to obtain even at the cost of great complexity; at least not with traditional approaches to lossless image compression. However, if we allow inter-band decorrelation and modeling in the baseline algorithm, nearly 30 percent improvement in compression gains for specific images in the test set become possible with a modest computational cost. In this paper we propose and investigate a few techniques for exploiting inter-band correlations in multi-band images. These techniques have been designed within the framework of the baseline algorithm, and require minimal changes to the basic architecture of the baseline, retaining its essential simplicity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a fast and efficient algorithm which finds the optimal quad-tree (QT) decomposition with leaf dependencies in the rate distortion sense. The underlying problem is the encoding of an image by a variable block size scheme, where the block size in encoded using a QT, each block is encoded by one of the admissible quantizers and the quantizers are transmitted using a first order differential pulse code modulation (DPCM) scheme along the scanning path. First we define an optimal scanning path for a QT such that successive blocks are always neighboring blocks. Then we propose a procedure which infers such an optimal path from the QT-decomposition and introduce a special optimal path which is based on a Hilbert curve. Then we consider the case where the image is losslessly encoded using a QT- decomposition and the optimal quantizer selection. We then apply the Lagrangian multiplier method to solve the lossy case, and show that the unconstrained problem of the Lagrangian multiplier method can be solved using the algorithm introduced for the lossless case. Finally we present a mean value QT-decomposition example, where the mean values are DPCM encoded.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The discrete cosine transform (DCT) is widely used in all transform-based image and video compression standards due to its well-known decorrelation and energy compaction properties for typical images. Many fast algorithms available for the DCT optimize various parameters such as additions and multiplications but they are input independent and thus require the same number of operations for any inputs. In this paper we study the benefits of input- dependent algorithms for the DCT which are aimed at minimizing the average computation time by taking advantage of the sparseness of the input data. Here, we concentrate on the inverse DCT (IDCT) part since typical input blocks will contain a substantial number of zeros. We show how to construct an IDCT algorithm based on the statistics of the input data, which are used to optimize the algorithm for the average case. We show how, for a given input and a correct model of the complexity of the various operations, we can achieve the fastest average performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A method for still image coding is proposed which allows for progressive transmission, because low detailed versions of the image can be reconstructed form a truncated bit stream. The proposed method is in its main aspects close to the classical pyramid approach of Burt and Adelson. While retaining the main idea of using a Laplacian pyramid type decomposition, the new proposal differs in the filters employed for pyramid decomposition and in the bit allocation and quantization. The image is decomposed into a centered spline Laplacian pyramid. The pyramid is quantized and coded following a layered quantization approach together with a layered coding method based on conditional arithmetic coding. The encoder outputs an embedded bit stream. Thus the decoder may truncate the bitstream at any point, which results in a more or less detailed image. Besides this rate- distortion scalability the coder has a multiresolution property, due to the pyramid decomposition. An extension to hybrid video coding is also discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we present a video coding system based on 3D subband coding with motion compensation (MC-3DSBC) which is novel both in its generation of 3D subbands and in their subsequent encoding. The rate allocation from the GOP level to each class of subbands is optimized by utilizing the structural property of MC-3DSBC that additive superposition approximately holds for both rate and distortion at the unit level. Experimental results show that the performance of the proposed video coding system exceeds that of an MPEG-1 implementation for several video test sequences covering various camera and object motions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rate control problem can be greatly simplified by using a wavelet coder, since the desired bit rate for a particular frame can be easily reached by using its embedding property. We convert the rate control problem to bit allocation problem for each frame, and solve it with two models. Frames are assumed to be independent in the basic model while their dependency is taken into account in the advanced model. Two computationally efficient rate control algorithm are then derived. Extensive experiments are performed to demonstrate the superior performance of the wavelet coder with the proposed rate control schemes over the MPEG standard with test model 5.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a design framework for perfectly reconstructed time-varying linear-phase paraunitary filter bands using a novel adaptive lapped transform (ALT). The ALT is based on the Generalized Lapped Orthogonal Transform (GenLOT) proposed by Queiroz. A time-varying filter bank is constructed through the factorization of the GenLOT into cascaded matrix stages. Variable length lapped transforms are subsequently generated by cascading a number of these matrix stages to build specific length filters. Several constraints on the design ensure perfect reconstruction and a fast implementation. An embedded ALT images codec is presented and the application of the ALT to the H.263 video coding standard is discussed. Preliminary results show that the ALT-based embedded image codec has a 1.61-2.35 and a 2.37-4.04 dB increase in peak signal-to-noise ratios (PSNR) compared to the JPEG image coding standard for the Lenna and Barbara test images, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a nonlinear approach to spatially scalable coding is developed. Within the context of the MPEG-2 scalable syntax, various decimation and prediction schemes are discussed for the interlaced and progressive format video processing. A novel spatio-temporal video interpolation technique is presented, which is used as the basic unit in the prediction schemes. In addition to the considerable scalability techniques, a lookahead quantization scheme is introduced for P- and B-type picture coding. The new quantization scheme results in further performance improvement by selective combination of the DCT domain scalar quantization and the spatial domain entropy- constrained vector quantization. The performance of the proposed scalable scheme is compared with that of the simulcast technique and the single layer coding. Remarkable performance improvement over the simulcast coding is achieved. While spatial scalability involves multi-layer coding, the new scalable scheme also achieves comparable or better performance that the single layer coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In typical image and video coding techniques, the choice of quantizer scales at the encoder plays a key role in determining the generated bit-rate and the coding performance. The distortion-rate (D-R) curve of a video unit fully characterizes the relation between quantizing distortion and encoder output rates and can thus be used in the choice of good quanitizer scales. However, obtainment of the D-R curves is a heavy computational load. We develop a piecewise-linear/exponential model to approximate the true curves for macroblocks. Based on this D-R model, we devise a quantizer control method which is a modification of that in Test Model 5 (TM5) for MPEG2. A reference quanitizer scale is first calculated from past bit usages and the buffer fullness as in TM5. Then the slope of each approximated macroblock D-R curve at this quanitizer scale is calculated. In line with human vision characteristics, an adjusted slope value is computed according to a macroblock activity measure. The quantizer scale yielding the adjusted slope on the D-R curve is found and used. Simulation results show that this method of quanitizer choice can attain not only higher PSNR values but also higher visual quality in coded video.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
By dynamically distributing the channel capacity among video programs according to their respective scene complexities, joint coding has been a shown to be more efficient than independent coding for compression of multiple video programs. This paper examines the bit allocation issue for joint coding of multiple video programs and provides a bit allocation strategy that results in uniform picture quality among programs as will as within a program.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new scene change control scheme which improves the video coding performance for sequences that have many scene changed pictures is proposed in this paper. The scene changed pictures except intra-coded picture usually need more bits than normal pictures in order to maintain constant picture quality. The major idea of this paper is how to obtain extra bits which are needed to encode scene changed pictures. We encode a B picture which is located before a scene changed picture like a skipped picture. We call such a B picture as a pseudo-skipped picture. By generating the pseudo-skipped picture like a skipped picture. We call such a B picture as a pseudo-skipped picture. By generating the pseudo-skipped picture, we can save some bits and they are added to the originally allocated target bits to encode the scene changed picture. The simulation results show that the proposed algorithm improves encoding performance about 0.5 to approximately 2.0 dB of PSNR compared to MPEG-2 TM5 rate controls scheme. In addition, the suggested algorithm is compatible with MPEG-2 video syntax and the picture repetition is not recognizable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To improve the quality of the reconstructed image for a given bit rate constraint, the assigned bits must be distributed efficiently using a set of admissible quantizers so that source distortion can be minimized. The optimal bit allocation scheme for source coding is based on Shannon's rate-distortion theory, which deals with minimization of source distortion subject to a channel rate constraint. To allocate a given quota of bits to an arbitrary set of different quantizers, several fast algorithms have been suggested for optimal bit allocation. However, these methods are still not practical due to large computational burden. This paper proposes a new fast algorithm which needs less computing time than existing fast algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object scalability is a new trend in video coding research. This paper presents an attempt of designing an object- oriented coding based on the notion of irregular mesh and image warping. Three techniques are employed: motion compensation based on image warping, image segmentation and object tracking based on (mesh) nodal point adjustment, and nonrectangular DCT coding performed on the irregular mesh. The preliminary simulation results of the PSNR values and the subjective coded image quality indicate that this coding scheme is suitable for object scalable coding at low bit rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new technique for object tracking. Object tracking methods usually rely on the motion homogeneity of the object to be tracked. The proposed method does not assume that the selected objects present either motion or spatial homogeneity. It is therefore a very powerful tool for enabling content-based functionalities. This object tracking technique is based on the concept of partition projection which is implemented by means of a double partition approach. The projection of the previous partition is directly imposed in the current image so that temporal stability is improved. Several examples in different scenarios are finally presented in order to assess the performance of the method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Region segmentation of images is a well-known 'ill-posed problem', and a specific algorithm like regularization seems to be available. In this paper, an active region segmentation algorithm based on a regularization approach using the Hopfield neural network is proposed. The objective function to be minimized by the network is defined based on the criteria that integrates region growing and edge detection for the image segmentation. The energy of the network tends to converge on a local minimum, sot hat pyramid images are used to avoid such local minima and to achieve fast convergence. Moreover, the active region segmentation algorithm is applied to a sequence of color images to track an object region that change in appearance through complex and nonstationary background/foreground situations. Experimental results show that it's possible to segment images and track the object region using the minimization principle of the energy function of the Hopfield neural network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an approach for the detection of human face and eyes in real time and in uncontrolled environments. The system has been implemented on a PC platform with the aid of simple commercial devices such as an NTSC video camera and a monochrome frame grabber. The approach is based on a probabilistic framework that uses a deformable template model to describe the human face. The systems has been tested on both head-and-shoulder sequences as well as complex scenes with multiple people and random motion. The system is able to locate the eyes from different head poses. The information provided by the location of the eyes is used to extract faces with frontal pose from a video sequence. The extracted frontal frames can be passed to recognition and classification systems for further processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem of automatically determining an uncalibrated camera's motion through space solely form its view of the static surroundings has only recently received attention. In this work, we present a new direct method for computing camera egomotion from optical flow data in the particular case of a camera having unknown and possibly varying local length. Here, egomotion refers to motion that is expressed with respect to the camera's local frame of reference. No restrictions are placed on the nature of the camera's motion other than that its translational and rotational components vary smoothly. Essential to the approach is the derivation of a differential form of the time-dependent epipolar equation for a single moving camera. The method requires that two special matrices be computed from optical flow data. Closed-form expressions, presented in terms of the entries of the tow matrices, are then given for the egomotion parameters, the focal length and its derivative. This self-calibration process constitutes as essential prerequisite to obtaining a reconstruction of the viewed scene from optical flow.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new method to detect and track non-rigid moving objects against a moving background is presented. In order to discriminate objects from background optical flow based segmentation methods are not appropriate for the case of non-rigid as they have non-uniform optical flow. The proposed method uses optical flow nd texture similarity to discriminate objects from the background. A well-known algorithm from neural networks, Kohonen's self-organizing maps, was adapted to measure the texture similarity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sequential images are getting more and more popular in reconstruction of 3D images for computer-aided surgery or radio-therapy, where contour detection is needed and plays a significant role. In order to overcome the conflict of optimization with computational cost, we have recently developed a novel algorithm to track contours in an image sequence automatically. The whole procedure starts from a list of labeled seed points on/near the desired boundary in the first frame, and extracts the first contour by dynamic programming (DP). Such contour is thickened symmetrically to form a band area, which is assumed to cover the desired contour in the second frame. Meanwhile, the previous seed points are regarded as uncertainties of the second frame. Then a new method is proposed to optimize these points within the band, that is, DP is operated again between two uncertainties which are t(t > 1) points apart, and get an optimal path. Such path may depart from the true contour near the two end-points, but possesses the optimal choices for the interval uncertainties. After all the uncertainties are optimized, the second optimal contour can be tracked, and again participate in the tracking of the next frame until all the contours in the sequence are outlined. Experiments shows optimal and intersection free result in sequences of cardiac vessels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a segmentation method aimed at separating the moving objects from the background in a generic video sequence. This task, accomplished at the coder site, is intended to support some new functionalities oriented to access and decode single objects of the coded video sequence, foreseen by innovative multimedia scenarios focused during the MPEG4 work. The proposed segmentation method comprises a motion detection, that produces a preliminary segmentation map, followed by a morphological regularization that plays an important role in eliminating misclassifications due to motion estimation ambiguities, noise, etc., of the original video sequence. The motion detection is essentially based on a higher order statistics (HOS) test that employs a temporally, non-linearly filtered version of the video sequence; this choice is motivated by HOS detection properties. The regularization phase, performed by basic morphological operators, provides a local connectivity constraint on the background-foreground map. The segmentation algorithm performance is illustrated by some experimental results carried out on MPEG4 test sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an integrated approach to segmenting moving foreground, where the moving foreground is of most interest to the viewer. Multiple cues are used-focus, intensity, and motion-in a two-layered neural network. Focus and motion measurements are taken from high frequency data, edges; whereas, intensity measurements are taken from low frequency date, object interiors. Combined, these measurements are used to segment a complete object. Results indicate that moving foreground can be segmented from stationary foreground and moving or stationary background. The neural network segments the entire object, both interior and exterior, in this integrated approach. Results also demonstrate that combining cues allows flexibility in both type and complexity of scenes. Integration of cues improves accuracy in segmenting complex scenes containing both moving foreground and background. Good segmentation yields bitrate savings when coding the object of interest, also called the video object in MPEG4. Our method combines simple measurements to increase segmentation robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic generation of textured object models from a sequence of range and color images requires two major tasks: measurement registration and measurement integration. Measurement registration is the estimation of the current position and orientation of the object in 3D space with respect to an arbitrary fixed reference, given the current measurement and the 3D object model under construction. Measurement integration is the updating of the 3D object model using the current registered measurement. In this paper we present an iterative 3D-3D registration technique that uses both texture and shape information available in the 3D object models and the 3D measurements. The proposed technique handles probabilistic models that are potentially incomplete before the measurement integration step. Measurements are acquired via a sensor characterized by a probabilistic sensor model. The object models are constructed automatically without user interaction. Each model is a compact uniform tessellation of 3D space, where each cell of the tessellation represents shape and texture in a probabilistic fashion. Free formed objects are supported and no prior knowledge about the object shape, texture or pose is assumed. Traditional registration methods consider only shape and geometric information. We consider texture information as an additional evidence by defining a generalized intercell distance measure that considers both the relative positioning of cells in space and the texture discrepancy between cells. Experimental results demonstrate the efficiency and robustness of the proposed method. The usefulness of texture in registration is highlighted in a comparison with results obtained considering only geometric information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current video coding standards use simple translational motion models in predefined blocks to reduce temporal redundancy. Although this approach is quite effective, artifacts are produced by the simplistic motion model used and the fixed motion boundaries. The motion compensation algorithm presented in this paper uses a new partially connected motion model that removes the constraint of fixed block sizes and translational motion. The resulting motion field is guaranteed to be smooth and generally provides a good approximation to the true motion in the scene. The parameters of this motion model are optimized using a previously described algorithm that provides an efficient method for updating several motion vectors simultaneously. The complete motion estimation/compensation algorithm is incorporated into a transform based coder and compared to an existing H.263 codec. SInce the transmitted motion field provides a good approximation to the true motion in the scene, an additional advantage of the partially connected motion model is demonstrated by using the transmitted motion parameters to interpolate skipped frames at the receiver.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new hierarchical block matching algorithm specially proper for a large search area, is proposed. The algorithm utilizes the spatial motion vector correlation under the fixed hierarchical search structure. Motion vectors of the causally neighboring blocks can be used to predict the motion vector of the current block, if the spatial motion vector correlation is strong. However, they are not helpful for searching complex or random motion. The proposed algorithm consists of two level searching steps. The higher one selects two initial estimates, one obtained by using motion vector correlation for continuous motion, the other by using minimum mean absolute difference for random or complex motion among rectangularly-sampled motion vector candidates in the search area, and the lower one is for the final motion vector refinement. Compared with previous hierarchical block matching algorithms, the scheme improves the accuracy of the estimated motion vector for random/complex motion as well as continuous motion. it is also proper for hardware implementation because of simple, fast, and regular search procedure. Simulation results show that the proposed algorithm drastically reduces the computational complexity to 5.0 percent of that of full search block matching algorithm, with the minor PSNR degradation of 0.4dB even in the worst case.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe a quad-tree based variable size block matching (VSBM) motion estimation algorithm which is as computationally efficient as fixed size block matching (FSBM) and yet provides a better quality prediction. The 'match and merge' scheme allows the dimensions of blocks to adapt to local activity within the image, and the total number of blocks in any frame can be varied while still representing true motion fairly accurately. This permits adaptive it allocation between the representation of displacement and residual data, and the variation of the overall bit allowance on a frame-by-frame basis. The cost of coding the motion information from the VSBM technique is compared with the 2D motion vector prediction adopted by H.263 and MPEG-4 using FSBM with 16 by 16 macroblocks. 1D and 2D VSBM motion vector prediction strategies are described. The techniques are evaluated using two complete MPEG-4 test sequences. For similar quality prediction (same mean square error), 16 percent fewer bits are required to ode the motion vectors from the 'Foreman' sequence using the VSBM technique and a 2D predictor. The saving increases to 68 percent for the 'Container Ship' sequence in which there is less disparate motion. The cost of including the quad- tree description is included in both cases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A novel idea to reduce OBMC search complexity based on checkerboard block grouping is proposed in this work. We call the proposed new scheme GOBMC. No iteration is required in the proposed scheme for encoding, since the obtained OBMC motion vector set nearly reaches a local optimal solution in one iteration step and, therefore, the complexity is significantly reduced. The distortion measures, both in terms of PSNR and visual quality, remain about the same as those obtained from iterative OBMC motion search. In decoding, we propose an OBMC reconstruction which reduces the complexity of multiplication by a factor of 38 percent while preserving the visual quality as obtained from BMC search with OBMC reconstruction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The optical flow (OF) can be used to perform motion-based segmentation or 3D reconstruction. Many techniques have been developed to estimate the OF. Some approaches are based on global assumptions; others deal with local information. ALthough OF has been studied for more than one decade, reducing the estimation error is still a difficult problem. Generally, algorithms to determine the OF are based on an equation, which links the gradient components of the luminance signal, so as to impose its invariance over time. Therefore, to determine the OF, it is usually necessary to calculate the gradient components in space and time. A new way to approximate this gradient information from a spatio- temporal wavelet decomposition is proposed here. In other words, assuming that the luminance information of the video sequences be represented in a multiresolution structure for compression or transmission purposes, we propose to estimate the luminance gradient components directly from the coefficients of the wavelet transform. Using a multiresolution formalism, we provide a way to estimate the motion field at different resolution levels. OF estimates obtained at low resolution can be projected at higher resolution levels so as to improve the robustness of the estimation to noise and to better locate the flow discontinuities, while remaining computationally efficient. Results are shown for both synthetic and real-world sequences, comparing it with a non multiresolution approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we describe a method for temporal tracking of video objects in video clips. We employ a 2D triangular mesh to represent each video object, which allows us to describe the motion of the object by the displacements of the node points of the mesh, and to describe any intensity variations by the contrast and brightness parameters estimated for each node point. Using the temporal history of the node point locations, we continue tracking the nodes of the 2D mesh even when they become invisible because of self-occlusion or occlusion by another object. Uncovered parts of the object in the subsequent frames of the sequence are detected by means of an active contour which contains a novel shape preserving energy term. The proposed shape preserving energy term is found to be successful in tracking the boundary of an object in video sequences with complex backgrounds. By adding new nodes or updating the 2D triangular mesh we incrementally append the uncovered parts of the object detected during the tracking process to the one of the objects to generate a static mosaic of the object. Also, by texture mapping the covered pixels into the current frame of the video clip we can generate a dynamic mosaic of the object. The proposed mosaicing technique is more general than those reported in the literature because it allows for local motion and out-of-plane rotations of the object that results in self-occlusions. Experimental results demonstrate the successful tracking of the objects with deformable boundaries in the presence of occlusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Source coding of multi-view image sequences is investigated. Two different frameworks are considered. The first one is a bidirectional predictive coding scheme, in which the displayed frames can be instantaneously decoded given the coded furthest right and left images. The second one is a unidirectional predictive coding scheme, in which the central frame is an I-frame and the remaining frames are coded based on this frame. Both frameworks can overcome problems related to occlusion, are compatible with current and proposed image and video coding standards, consider the special indexing characteristics of multi-view sequences, and can be implemented in applications where the display device has a small buffer. The actual coding is performed using the subspace projection technique, a locally adaptive transform approach. The transformation matrix is equivalent to a projection operation and is determined using the local cross-correlation characteristics. Several design issues in generating the adaptive transforms and experimental results are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A rate-distortion framework is used to define a displacement vector-field estimation technique for use in video coding. This technique achieves maximum reconstructed image quality under the constraint of a target bitrate for the coding of the vector sequence. Use of this technique is evaluated for two application areas in which the need for high compression of displacement vector fields in particularly acute. The first is motion-field coding for very low bit rate image sequence transmission as in videophone applications. The second application area is coding for the transmission of disparity fields. This is needed for the generation at the receiver of intermediate viewpoints through spatial interpolation. It is also needed in a number of other applications requiring accurate depth knowledge, including 3D medical data transmission and transmission of scenes to be postprocessed using depth-keyed segmentation. Experimental results illustrating the performance of the proposed technique in these application areas are presented and evaluated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a technique for multichannel optical flow estimation. We combine the brightness constancy constraint with the relationship between optical flow fields and disparity fields in estimating optical flow fields by solving systems of linear equations. In this formulation, the so-called round-about compatibility constraint is used to ensure the coherence between optical flow fields and to pin down the tangential components of optical flow vectors which are undiscernible form the first order intensity derivatives. Since disparity information is used to assist optical flow estimation here, a reliable disparity field estimator is required. The disparity fields are recovered using a disparity space image technique. We also discuss the sensitivity of this technique when the essential parallel- axes camera configuration assumption is violated. Experiments show that the technique is very promising for practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present a combination of there steps to code a disparity map for 3D teleconferencing applications. First we introduce a new disparity map format, the chain map, which has a very low inherent redundancy. Additional advantages of this map are: one single bidirectional map instead of the usual two unidirectional vector fields, explicit indication of occlusions, no upper or lower bound on disparity values, no disparity offset, easy generation by disparity estimators and easy interpretation by image interpolators. In a second step, we apply data reduction on the chain map. The reduction is a factor too, thereby losing explicit information about the position of occlusion areas. An algorithm for image interpolation in absence of occlusion information is presented. The third step involves entropy coding, both lossless and lossy. A scheme specially suited for the chain map has been developed. Although the codec is based on a simple prediction process without motion compensation, compression ratios of 20 to 80 can be achieved with typical teleconferencing images. These results are comparable to those obtained by complex schemes based on 2D/3D motion compensation using disparity vector fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work aims at determining dense motion and disparity fields given a stereoscopic sequence of images for the construction of stereo interpolated images. At each time instant the two dense motion fields, for the left and the right sequences, and the disparity field of the next stereoscopic pari are jointly estimated. The disparity field of the current stereoscopic pair is considered as known. The disparity field of the first stereoscopic pair is estimated separately. For both problems multi-scale iterative relaxation algorithms are used. Stereo occlusions and motion occlusions/disclosures are detected using error confidence measures. For the reconstruction of intermediate views a disparity compensated linear interpolation algorithm is used. Results are given for real stereoscopic data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The increasing demand for 3D imaging and recent developments of autostereoscopic displays will accelerate the usage of 3D systems in various areas. However, limited channel bandwidth is, as for monocular images, the main bottleneck for realizing 3D systems. As a result, an efficient compression algorithm will be essential to reduce the bandwidth requirement while maintaining the perceptual visual quality at the decoder. In this paper, we will focus on compression of stereo images. When it comes to stereo image coding, we can take advantage of binocular redundance by using disparity compensation. The most popular disparity compensation method approaches so far have ben block based methods, due mostly to their simplicity. Block based methods, however, may suffer from blocking artifacts at low bit rates due to the uniform disparity assumption within a fixed block. Meanwhile, if we reduce the block size, the disparity estimation may suffer from various noise effects which result in increases of bit rates for the disparity. Considering these observations, we estimate disparity based on a small block or a pixel with thee energy equation derived from the MRF model. In order to prevent oversmoothing across boundaries, we use the combined intensity edges of two images as an initial disparity boundary. Then, we segment the resulting smooth disparity field. Finally, the disparity and the starting position are encoded using DPCM and the corresponding boundary is encoded using Run Length Chain coding. At the end of this paper, we present experimental results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new feature-based method for the correspondence problem is proposed. The used feature is the boundary of a parameter-dependent connected component of an image which is introduced in. In general, there is no one- to-one relationship between the points of a pair stereo images. We prove that under certain conditions, there is such a relationship between the sets of the boundary points of the connected components of the stereo images. This correspondence may be identified through the epipolar geometry. Furthermore, since such connected components may represent some meaningful objects in images, we may obtain the corresponding objects directly, instead of only the corresponding points or lines as in some other methods which need the process of feature grouping to conduct further analysis. Also, a hierarchy of the connected components provides us a coarse-to-fine matching method to recover the surface corresponding to a connected component.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As 2D image communication systems come into use widely, 3D imaging technology enhancing the reality of visual communication is getting to be considered as a promising next-generation medium that can revolutionize information systems. To date, 3D image communication has not been discussed at a comprehensive level because several kinds of promising 3D display technologies are still making rapid progress. Considering such a situation, this paper introduces the concept of the 'integrated 3D visual communication system'. The key feature in this new concept is a display-independent neutral representation of visual data. The flexibility of this concept will promote the progress of 3D image communication systems before the 3D display technology reaches maturity. In this paper, for this purpose, ray-based approach is examined. In the present representation method, the whole ray data is equally treated as a set of orthogonal views of the scene objects. The advantage of this approach is to allow the synthesis of any perspective view by gathering appropriate ray data from the set of orthogonal views independently of any geometric representation. A real-time progressive transmission method has been also examined. The experimental results show how the present representation method could be applied to the next-generation 3D image communication system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a new approach to feature-based head tracking and pose estimation. Head tracking and pose estimation find their most important applications in motion analysis for model-based video coding. The proposed algorithm employs an underlying 3D head model, feature-based pose estimation, and texture mapping to produce accurate templates for the feature tracking. In this way, the set of templates used for the matching is constantly updated with the pose changes, allowing the algorithm to track the features over a large range of head motion without loss of precision and error accumulation. Given a rough estimate of the head scale, the initial feature identification is performed automatically and the tracking is successful over a large number of video frames. Computational complexity is also considered with the aim towards creating a real-time end-to-end model-based video coding system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new contour coding technique using motion information is proposed, for the object-oriented video coding. In the conventional technique using motion information by Gu, the region-based motion compensation is performed, and only the contours neighboring the motion- failure (MF) regions are encoded by the proper intra-frame contour coding technique. However, since even small local motions within a region may incur the MF regions, the temporal correlation of the contours cannot be effectively exploited by the conventional techniques. In our approach, those MF regions are significantly reduced by employing the two-stage motion compensation. While the region-based motion compensation is performed in the first stage, the MF regions proceed to the second stage motion compensation to find the contours in the previous frame which coincide with the contours of the MF regions. In addition, by introducing the notion of the error band, the current contours can be properly fitted to the motion compensated contours using inter-frame relationship of the contours. From the simulation results, it is shown that the proposed technique provides better performance than the conventional technique by Gu. Moreover, by varying the width N of the error band, it is shown that the bit amount for the shape information can be adjusted according to the channel condition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study investigates the design and performance of a spatial domain image encoding scheme that adapts to the localized statistical structure of an image. An adaptive differential pulse code modulation (DPCM) image coding system operates on an image that has been preprocessed into segments of variable size, square blocks. Each block is separately encoded by a DPCM system whose parameters have been obtained based upon an underlying nonstationary image model fitted to the block. The source coding performance of the adaptive DPCM algorithm proposed in this study has ben found to result in an improvement of 2.5 dB, or greater, compared to that obtained using a non-adaptive, conventionally designed DPCM encoder/decoder pair when operating at low bit rates. Reconstructed images obtained in this study are of perceptually higher-quality due to the adaptive encoding system design being based on the more realistic assumption of nonstationary statistics. Specifically, experiments have revealed that reconstructed edges within local regions of the image are sharper providing an overall improvement in a viewers subjective assessment of global image quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A vector quantization based on a psychovisual lattice is used in a visual components image coding scheme to achieve a high compression ratio with an excellent visual quality. The vectors construction methodology preserves the main properties of the human visual system concerning the perception of quantization impairments and takes into account the masking effect due to interaction between subbands with the same radial frequency but with different orientations. The vectors components are the local band limited contrasts Cij defined as the ratio between the luminance Lij at point, which belongs to the radial subband i and angular sector j, and the average luminance at this location corresponding to the radial frequencies up to subband i-1. Hence the vectors dimension is depending on the orientation selectivity of the chosen decomposition. The low pass subband, which is nondirectional is scalar quantized. The performances of the coding scheme have been evaluated on a set of images in terms of peak SNR, true bit rates and visual quality. For this, no impairments are visible at a distance of 4 times the height of a high quality TV monitor. The SNR are about 6 to 8 dB under the ones of classical subband image coding schemes when producing the same visual quality. Due to the use of the local band limited contrast, the particularity of this approach relies in the structure of the reconstruction image error which is found to be highly correlated to the structure of the original image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a simple color segmentation technique which could be used in the model-based very low bit-rate coding approaches for videophone applications, in which the delimitation of the face of speaker is request. This work attempts to segment the face of speaker using color cues. To better take the advantage of the color contents of images, the color segmentation is carried out in HSI (hue, saturation, intensity) space with the three components used in two steps. The original image is first splitted into two groups of regions, one has higher saturation values and other has lower saturation values,b y using an adaptive threshold value applied to the histogram of saturation. In the high saturation regions, the hue component can furnish useful references for further segmentation, while in the low saturation regions the intensity component can play the similar role. For each group of regions, a multi- thresholding technique based on either hue or intensity component is then proposed for the subsequent segmentation. After both groups of regions are segmented, a combination of these two segmentation results will provide the finally segmented image. Some experiments with images taken from typical 'head-and-shoulders' videophone sequences are carried out and some results are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digitalized video and audio system has become the trend of the progress in multimedia, because it provides
great performance in quality and feasibility of processing. However, as the huge amount of information is needed
while the bandwidth is limitted, data compression plays an important role in the system. Say, for a 176 x 144
monochromic sequence with 10 frames/sec frame rate, the bandwidth is about 2Mbps. This wastes much channel
resource and limits the applications. MPEG (moving picttre ezpert groip) standardizes the video codec scheme,
and it performs high compression ratio while providing good quality. MPEG-i is used for the frame size about
352 x 240 and 30 frames per second, and MPEG-2 provides scalibility and can be applied on scenes with higher
definition, say HDTV (high definition television).
On the other hand, some applications concerns the very low bit-rate, such as videophone and video-conferencing.
Because the channel bandwidth is much limitted in telephone network, a very high compression ratio must be
required. ITU-T announced the H.263 video coding standards to meet the above requirements.8 According to
the simulation results of TMN-5,22 it outperforms 11.263 with little overhead of complexity. Since wireless communication
is the trend in the near future, low power design of the video codec is an important issue for portable
visual telephone.
Motion estimation is the most computation consuming parts in the whole video codec. About 60% of the
computation is spent on this parts for the encoder. Several architectures were proposed for efficient processing of
block matching algorithms. In this paper, in order to meet the requirements of 11.263 and the expectation
of low power consumption, a modified sandwich architecture in21 is proposed. Based on the parallel processing
philosophy, low power is expected and the generation of either one motion vector or four motion vectors with
half-pixel accuracy is achieved concurrently. In addition, we will present our solution how to solve the other
addition modes in 11.263 with the proposed architecture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although block-based image compression techniques seem to be straightforward to implement on parallel MIMD architectures, problems might arise due to architectural restrictions on such parallel machines. In this paper we discuss possible solutions to such problems occurring in different image compression techniques. Experimental results are included for adaptive wavelet block coding and fractal compression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The application of morphological methods in image analysis has often shown its efficiency by providing a geometrical approach to signal processing while all other techniques offer a spatial frequency approach. However, the morphological operators used in applications as image segmentation are highly computation intensive because of their iterative, sequential and irregular behavior and therefore, it is difficult to reach real time performances. In order to improve the performances, parallel processing is a necessity and, in this paper, we present a parallel implementation of morphological operators using the single- chip multiprocessor TMS320C80. First of all, we study the parallel implementation of basic morphological operators. These operators are well-suited to parallelism and therefore, do not bring major problems because of their regular behavior. Then, we suggest a parallelization of morphological filters by reconstruction which are essentially irregular.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fractal image compression is computationally extensive. Therefore speedup techniques are required to achieve time demands comparable to other compression techniques. In this paper we combine sequential and parallel techniques suitable for MIMD architectures which moves this compression scheme closer to real-time processing. The algorithms introduced are especially designed for memory-critical environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For video applications such as video-on-demand, signaling is an important step for service initialization including service setup, authentication, and resource allocation. This paper describes the design and implementation of an experimental platform for delivery of real-time MPEG encoded video over LAN with DSM-CC signaling. The platform consists of a real-time MPEG encoder system as a server, a network node workstation, and a client with MPEG decoding capability. Specifically, we have implemented a client- initiated subset of the DSM-CC user-to-network (UN) protocol for signaling. Communication between the network node and server/client is achieved by using the TCP/IP protocol. The network node provides the functionality of upstream message exchange, UN and/or UU (user-to-user) resource negotiation, and MPEG bitstream relay. The operation of our current video platform setup is quite robust and can deliver MPEG video up to about 1.5 Mb/s via DSM-CC protocols. It is demonstrated that practical implementation of video applications is feasible over small-scale LANs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The quality of multimedia service depends on the continuity of each media object and synchronization of media objects. These constraints are equally applied to on-demand and real- time services. Continuity of a media object guarantees stable playback of that media object and synchronization of media objects is attained using common time-base. To propose a fine synchronization of MPEG-2 media, a delay model of MPEG-2 codec system is presented. Based on the given delay model, a precise synchronization mechanism is devised by compensating media's decoding and presentation time stamps.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper deals with a class of morphological operators called connected operators. These operators interact with the signal by merging flat zones. As a results, they do not create any new contours and are very attractive for filtering tasks where the contour information has to be preserved. This paper focuses on a class of operators dealing with motion information. They remove from the original sequence the components that do not undergo a specific motion. They have a large number of applications including image sequence analysis with motion multiresolution decomposition and motion estimation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a new approach to robust 3D rigid body motion estimation and scene structure recovery using an epipolar corridor. In comparison to traditional two stage approaches we do not rely on independent establishment of feature point correspondences and subsequent computation of 3D motion parameters, but iteratively feed the motion parameters back to the point correspondences and subsequent computation of 3D motion parameters, but iteratively feed the motion parameters back to the point correspondence estimator, restricting the search space to an epipolar corridor. As the iterations proceed we narrow the width of the corridor and reach a stable solution with all point correspondences obeying the epipolar line constraint. The least median of squares estimator is integrated into the 3D motion parameters estimation framework to deal with the multi-motion problem. The position of the feature points along the epipolar line finally leads to structure recovery from motion. Experimental results using real and synthetic image sequence data show the ability of the approach to robustly estimate 3D motion parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an approach to realistic motion field estimation. In this approach, an image is first segmented into homogeneous regions using a new multiscale gradient algorithm followed by watershed transformation. The multiscale gradient algorithm efficiently solves the over- segmentation problem of watershed transformation, increases segmentation accuracy and reduces the computational cost. The motion field is then estimated using block-matching with a consistency constraint. The consistency constraint function is defined by the neighboring motion vectors and the segmentation map. Simulation results show that the motion fields generated by the block-matching with consistency constraint are very smooth within each object, approaching realistic motion fields, even when a small block size is used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a reliability metric for multiparameter motion estimation. The proposed reliability is defined as an ensemble average of squared estimation error for multiple motion parameters, which is an extension of our previous work for translational motion models to multiparameter motion models. The proposed reliability is then applied to a four-parameter motion model to illustrate validity of the approach. A simplified version of the reliability is also considered to clarify meaning and properties of the reliability. Finally, the proposed reliability is applied to an extended blockmatching method to reduce computational cost without increasing estimation error.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
From the linearized Taylor series expansion, an iterative, gradient-based method is used to estimate the zoom and pan motion parameters. In order to take into account the high order expansion error, the Wiener filtering techniques are investigated. It is shown that the expansion error can be efficiently removed with the Wiener filter. However, the reliability of the estimated parameters is still highly dependent upon the accuracy of the gradient information of the images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Given the small bit allocation for motion information in very low bit-rate coding, motion estimation using the block matching algorithm (BMA) fails to maintain an acceptable level of prediction errors. The reason is that the motion model, or spatial transformation, assumed in block matching cannot approximate the motion in the real world precisely with a small number of parameters. In order to overcome the drawback of the conventional block matching algorithm, several triangle-based methods which utilize triangular patches instead of blocks have been proposed. To estimate the motions of image sequences, these methods usually have been based on the combination of optical flow equation, affine transform, and iteration. But the computational cost of these methods is expensive. This paper presents a fast motion estimation algorithm suing a multiple linear regression model to solve the defects of the BMA and the triangle-based methods. After describing the basic 2D triangle-based method, the details of the proposed multiple linear regression model are presented along with the motion estimation results from one standard video sequence, representative of MPEG-4 class A data. The simulation results show that in the proposed method, the average PSNR is improved about 1.24 dB in comparison with the BMA method, and the computational cost is reduced about 40 percent in comparison with the 2D triangle-based method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
All the present standards for low bit-rate video coding are based on an interframe motion-compensated hybrid scheme. These systems offer a bad error resilience, making them inappropriate for the transmission of video sequences over noisy channels. The introduction of error correction (FEC) codes may help the decoding process, but the consequent increase on the transmission rate has to be compensated by throttling the source coder rate to even lower rates. Even worse, in the case of fading channels or highly noisy channels, the FEC may not be sufficient to correct the transmission errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce the concept of motion adaptive spatio-temporal model-assisted compatible (MA-STMAC) coding, a technique to selectively encode areas of different importance to the human eye in terms of space and time in moving images with the consideration of object motion. PRevious STMAC was proposed base don the fact that human 'eye contact' and 'lip synchronization' are very important in person-to-person communication. Several areas including the eyes and lips need different types of quality, since different areas have different perceptual significance to human observers. The approach provides a better rate-distortion tradeoff than conventional image coding techniques base don MPEG-1, MPEG- 2, H.261, as well as H.263. STMAC coding is applied on top of an encoder, taking full advantage of its core design. Model motion tracking in our previous STMAC approach was not automatic. The proposed MA-STMAC coding considers the motion of the human face within the STMAC concept using automatic area detection. Experimental results are given using ITU-T H.263, addressing very low bit-rate compression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, the Shepard interpolation algorithm has ben used for a compact still image representation and for motion estimation in video sequences. In the context of still image representation, the goal is to compute the values and the locations of a fixed number of samples to approximate efficiently a natural grey-level image. These values will be found by a relaxation process. In the context of image coding, samples values and samples locations have to be coded. A quantization is applied on the samples values and thus reduces the amount of information. Samples locations are represented through a binary image which is coded using an arithmetic coder. In the context of motion estimation scheme, the Shepard interpolation algorithm has been used to construct a dense vector field from few vectors. This dense vector field is used to predict the current frame. Three schemes are presented. In the first scheme, vectors are regularly located and a relaxation algorithm is used to compute the 'best' vector values. In the two following schemes, a relaxation method is used to compute the 'best' locations and the 'best' values of the vectors. Experimental results are provided using several images and video sequences showing the efficiency of such non-uniform sampling process of data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wavelet coding is often used to divide an image into multi- resolution wavelet coefficients which are quantized and coded. By 'vectorizing' scalar wavelet coding and combining this with vector quantization (VQ), vector wavelet coding (VWC) can be implemented. Using a finite number of states, finite-state vector quantization (FSVQ) takes advantage of the similarity between frames by incorporating memory into the video coding system. Lattice VQ eliminates the potential mismatch that could occur using pre-trained VQ codebooks. It also eliminates the need for codebook storage in the VQ process, thereby creating a more robust coding system. Therefore, by using the VWC coding method in conjunction with the FSVQ system and lattice VQ, the formulation of a high quality very low bit rate coding systems is proposed. A coding system using a simple FSVQ system where the current state is determined by the previous channel symbol only is developed. To achieve a higher degree of compression, a tree-like FSVQ system is implemented. The groupings are done in this tree-like structure from the lower subbands to the higher subbands in order to exploit the nature of subband analysis in terms of the parent-child relationship. Class A and Class B video sequences from the MPEG-IV testing evaluations are used in the evaluation of this coding method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Images usually have quite different properties place by place. This variation of local properties may affect the coding efficiency when utilizing a fixed coding method. In this paper, a new coding method which selects discrete cosine transform (DCT) or wavelet transform (WLT) block by block adaptively according to the local properties of the image to achieve high coding efficiency, is proposed. A separable WLT, which gives higher coding efficiency and can be treated in the same manner as DCT, is proposed to simplify the hardware/software implementation. And an adaptive scanning method associated with the separable WLT, which achieves high coding efficiency by scanning the corresponding coefficients with large values across the resolution first to shorten the zero-run, is also proposed. Furthermore, for coding of arbitrary-shaped images with block-based DCT and WLT, a new padding technique based on the concept of making boundary blocks smooth is proposed. These new coding methods are verified through the simulation in comparison with the MPEG-4 verification model. The simulation results show the improvement of PSNR up to 2 dB.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A hybrid video compression scheme is reviewed and the incorporation of the concept of regions of interest into this scheme is investigated. The employed DFD coding method is in its main aspects close to the classical pyramid approach of Burt and Adelson. In particular a centered least squares Laplacian pyramid is used which decomposes the DFD into several levels with differing spatial resolution. This pyramid is quantized and coded following a layered quantization approach together with a layered coding method based on conditional arithmetic coding. The DFD encoder outputs an embedded bit stream. Thus the coder control may truncate the bitstream at any point,a nd can keep a fixed rate. SImulation results show that incorporation of regions of interest can further improve the rate-distortion performance for low bit rate video coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We report on recent advances in traditional DCT based video coding at low bitrates. These improvements allow either an increase in coding efficiency or an increase in other functionalities. Our investigation is conducted within the framework of the ongoing work towards the MPEG-4 video standard. The ISO moving picture experts group (MPEG) is currently developing this standard after having completed the MPEG-1 and the MPEG-2 standards. The MPEG-4 video standard is addressing a number of content based as well as traditional functionalities. The development process consists of iterative refinement of the verification model via a set of well defined core experiments. Our first experiment is on improved coding efficiency of intra and uses DC and AC predictions and optimized scanning of DCT coefficients followed by a separate optimized variable length code table. Our second experiment is the study of bidirectional coding to allow additional functionality such as temporal scalability at low bit-rates. We present results of these experiments and summarize our findings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object based coding is a new technique that is being investigated to achieve extremely high compression ratios for video sequences. The 'classical' approach to this is a 4 steps algorithm: segmentation, motion estimation and compensation, coding of the segmentation information and finally coding of the prediction errors. In this text we describe another approach where we try to combine these first three steps: try to do the motion estimation and segmentation together and meanwhile collecting important information for the coding the segmentation information. The segmentation is done by finding motion parameters that will yield the best segmentation and growing the segmentation from areas with a good prediction. Coding the segmentation will be done using cost-functions which are calculate in the segmentation step. The actual coding will be done by using simple geometric forms, in our case we used superquadrics, and combining them to arrive at an approximated shape.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes mesh-based methods for object-based video coding in a similar context as MPEG-4. The mesh-based approach is aimed at supporting a wide variety of applications for object-based bitstream manipulation and editing. We combine recent mesh-based motion models with the object-based video representation of MPEG-4. A mesh tracking algorithm is developed that enables the tracking of foreground video objects, through forward motion estimation, piecewise affine warping and dynamic mesh updates. Techniques to compress the mesh node positions and motion vectors are discussed. The proposed methods were tested on MPEG test sequences of varying complexity. We have replaced block-based motion compensation of the MPEG-4 video verification model with mesh-based tracking and motion compensation and evaluated the performance of the resulting object-based coder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A number of object-oriented coding algorithms have been proposed for coding video sequences at low bit rates. Instead of estimating motion of pixel blocks, these algorithms segment each image into regions of uniform motion and estimate the motion of these regions. Estimating the segmentation and computing motion parameters are evidently closely related. Most algorithms iteratively compute complex motion parameters and segmentation estimates, and typically computationally intensive. Image intensity segmentations were also used instead of motion field segmentations based on the hypothesis that adjacent pixels with similar luminance values are part of the same object, and therefore share common motion parameters. We previously proposed a simple tow-stage algorithm for which 1) a translational block-motion field is used to compute a translational motion field and its segmentation, 2) an optical flow field is then used to compute affine motion parameters for each segmented region. In this paper, we propose to replace the translational block motion field by another translational motion field which assigns a motion vector to each region of an image intensity segmentation. This approach combines the advantages of both intensity and motion filed segmentations, generates motion field segmentations that matches the scene content more closely with 15 percent-25 percent fewer objects, and therefore reduces the side bit rate required for coding the motion field segmentation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Region-based coding methods can provide a solution of maximal quality for the transmission of video sequences through channels of low bit-rate due to their 'scene- adaptive' nature. Here, the visual quality of frames predicted by motion compensation on nearly semantic homogeneous regions is better than for the conventional block-based and quad-tree based methods. If the error signal is not encoded, the most bit-consuming component is the description of the regions borders and of the topology of the segmentation map. A planar graph with polygonal arcs is used to represent the geometrical form and relations of regions in each frame. A method allowing to adapt the segmentation description to the variable available bit-rate is proposed based on rate-distortion theory and constrained optimization. The method uses the concept of description layers. The 'basic' layer contains only the triplet nodes of the graph and the vertices of the highest contrast and curvature. The 'maximal enhancement' layer contains all the nodes and polygonal vertices of the segmentation graph. The choice of 'optimal' layer for each polygonal arc is done independently, minimizing Lagrangian cost function. The last one combines rate and distortion measures. The entropy estimates for the encoding of all vertices of a given arc are taken as the rate measure. The distortion measure is the total sum of squared DFD in the area delimited by the basic layer and maximal enhancement layer for a given arc. The experiments on the 420-625 CCIR videoconference sequence showed a 30 percent decrease of the bit-rate for an unnoticeable loss in quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The extension of conventional video coding schemes by additional content-based functionalities is a key issue in MPEG-4 standardization. To this end, suitable techniques for coding texture information of arbitrary shaped image segments have to be developed. In this paper, a new approach for coding of such image segments is presented which is based on signal extrapolation. In contrast to other extrapolation approaches described in the literature, the proposed method is computationally simple and specifically adapted to the needs of real-time video coding. The extrapolation works in the spatial domain and yields a signal with high energy concentration in the low frequency area. Restricting the extrapolation to a block grid allows straight forward extension of common block-based video coding techniques to coding of arbitrarily shaped image objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article deals with the coding of segmentation maps used in region-based video coding. The scheme proposed to code these maps is based on an efficient edge representation using a graph of contours. Lossless and lossy compressions are then applied in order to have the cheapest representation. Lossless compression is based on Freeman chain-code associated to arithmetic coding, lossy compression is based on polygonal approximation of the contours controlled y a minimum description length criterion. We compare these two encoding schemes applied to different sequences and initial segmentation maps, 1.3 bits per contour points and 0.5 bits per contour points are obtained. Some discussions are also proposed to optimize the segmentation map and its accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce a novel region-based video compression framework based on morphology to efficiently capture motion correspondences between consecutive frames in an image sequence. Our coder is built on the observation that the motion field associated with typical image sequences can be segmented into component motion subfield 'clusters' associated with distinct objects or regions in the scene, and further that these clusters can be efficiently captured using morphological operators in a 'backward' framework that avoids the need to send region shape information. Regions segmentation is performed directly on the motion field by introducing a small 'core' for each cluster that captures the essential features of the cluster and reliably represents its motion behavior. Cluster matching is used in lieu of the conventional block matching methods of standard video codecs to define a cluster motion representation paradigm. Furthermore, a region-based pel-recursive approach is applied to find the refinement motion field for each cluster and the cluster motion prediction error image is coded by a novel adaptive scalar quantization method. Experimental results reveal a 10-20 percent reduction in prediction error energy and 0.5-2 dB gain in final reconstructed PSNR over the standard MPEG-12 coder at typical bitrates of 500 kbits/sec to 1 Mb/sec on standard test sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model-supported exploitation is a new paradigm in image understanding research. In this paradigm, three main technical areas have been identified: semi-automatic construction of site models, automated positioning of images to the sites, and monitoring the movable objects and construction activities. In this paper, we summarize recent progress in the detection and counting of vehicles in selected locales, monitoring and characterization of vehicle groupings. We present the algorithms used and the results obtained. The detection and counting method employs geometrical models and uses a spatial contour matching approach. The configuration detection method exploits knowledge of geometrical models in the frequency domain. The issues of parameters learning as well as sensitivity of the detection performance to misspecification of model and tuning parameters are briefly discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Neutralizing the threat of an incoming ballistic missile is a difficult task. Often the missile disintegrates, leaving the warhead surrounded by a number of ballistic fragments. Among these fragments only the warhead must be intercepted. Thus the challenge is for the interceptor to identify and track the warhead so that a successful strike can be achieved. This paper addresses the problem of using noisy image sequence data captured by on-board sensors in the nose cone of the interceptor to detect warheads among fragments. Detection and tracking necessitates the extraction of reliable motion estimates, which is often difficult when the data sequence is noisy. We propose a simple algorithm that exploits both spatial and temporal correlation to suppress noise in the observed image sequence. Our algorithm calculates reliable estimates of the global motion caused by camera movement and uses these motion estimates to reduce significantly the noise in the video sequence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the problem of tracking a ballistic missile warhead. In this scenario, the ballistic missile is assumed to be fragmented into many pieces. The goal of the algorithm presented here is to track the warhead that is among the fragments. It is assumed that images are acquired from an optical sensor located in the interceptor nose cone. This imagery is used by the algorithm to steer the course of interception. The algorithm proposed in this paper is based on continuous spatio-temporal wavelet transforms (CWTs). Two different energy densities of the CWT are used to perform velocity detection and filtering. Additional post-processing is applied to discriminate among objects traveling at similar velocities. Particular attention is given to achieving robust performance on noisy sensor data and under conditions of temporary occlusions. First we introduce the spatio-temporal CWT and stress the relationships with classical orientation filters. Then we describe the CWT- based algorithm for target tracking, and present results on synthetically generated sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many battlefield applications require the ability to transmit images over narrow bandwidth noisy channels. Previous research has demonstrated that the utilization of predictive trellis-coded quantization (PTCQ) incorporating a nonlinear prediction filter results in a method of robust source coding. Robust source coding provides both compression and noise mitigation without the need to allocate additional bandwidth for channel coding. However, the traditional PTCQ algorithm is suboptimal. This suboptimality arises from the prediction operation; a trellis path is eliminated in favor of the survivor path at each stage in time to form the input to the prediction filter. It is reasonable to assume that this eliminated path may have produced al lower overall distortion than the survivor path. In this paper we address this suboptimality by incorporating a look-ahead stage into PTCQ algorithm. This 'less-greedy' approach allows coding gains with a slight increase in overhead. The resulting algorithm yields an image encoding technique, which enables resilient image transmission over tactical channels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Compression of SAR imagery for battlefield digitization is discussed in this paper. THe images are first processed to separate out possible target areas. These target areas are compressed losslessly to avoid any degradation of the images. The background information which is usually necessary to establish context, is compressed using a hybrid vector quantization algorithm. An adaptive variable rate residual vector quantizer is use to compress the residual signal generated by a neural network predictor. The vector quantizer codebooks are optimized for entropy coding using an entropy-constrained algorithm to further improve the coding performance. This constrained vector-quantizer combination performs extremely well as suggested by the experimental results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We address efficient context modeling in arithmetic coding for wavelet image compression. Quantized
highpass wavelet coefficients are first mapped into a binary source, followed by high order context
modeling in arithmetic coding. A blending technique is used to combine results of context modeling
of different orders into a single probability estimate. Experiments show that an arithmetic coder with
efficient context modeling is capable of achieving a 10% bitrate saving (or 0.5 dB gain in PSNR) over
a zeroth order adaptive arithmetic coder in high performance wavelet image coders.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vector quantization (VQ) based on a fixed block-size classification (FBSC) model, which is known as classified VQ (CVQ), offers a useful solution for the edge degradation problem of conventional image VQ. In our previous work, we have developed a VQ technique based on a variable block-size classification (VBSC) model, in which an image is segmented into blocks of various size, and each segmented region is encoded at a different rate according to its level of detail. The low-detail regions of the image consist of variable size blocks and are encoded at very low bitrates with little perceptual degradation. High-detail regions, which are isolated into the smallest blocks, are classified into various edges of which each is separately encoded. In this paper, a rate-distortion function (RDF), R(D), is presented for a VBSC model. We obtain a theoretical R(D) bound on the performance of VQ based on a VBSC model. It is theoretically proven that the R(D) bound of the VBSC model is lower than those of the Gaussian model and the FBSC model. We also experimentally evaluate a RDF for the VBSC model and compare with the theoretical RDF. There is a gap of about 0.1 bpp between the theoretical RDF and the experimental RDF in VBSC model-based VQ coding. We expect that this gap can be reduced by subsequently employing an entropy coder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the problem of the representation of images and video objects of arbitrary shape with surface models based on active meshes. Usually, such a control point optimization relies on the mean squares error between the original signal and the model. However, the computation of such a criterion implies to parametrize both the original and the model surfaces and it turns out that on the one hand, the parametrization of the original and model manifolds are independent, and on the other hand that many choices exist. Thus, this approach provides a solution according to these choices and if we change these, we change the approximation result. To avoid this problem, we developed an iterative method based on the surface evolution theory which computes the 3D control points according to the minimization of a purely geometric criterion. In this paper, we illustrate the use of such an approach computing an adaptive mesh-based representation of an image surface. This approach is particularly suited both for compression and for manipulations such as spatial scalability and more general geometric transformations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In modern color imaging a variety of color spaces are used to represent an image. It has been well established in literature that the choice of color space has an impact on the achievable compression. The purpose of this research is to investigate compression related properties of various color spaces. We investigated what we considered to be the most popular color spaces used today, namely, RGB, YIQ, YUV, YCbCr, HSI, CMYK, XYZ and CIELab. The following properties of color spaces that influence compressibility have been studied: energy distribution among color planes, plane bandwidth, DCT energy compaction, and impact of gamma correction. We also compared the compressibility of low resolution images, which have been used for most of the compression results reported in literature, with high resolution images, which are becoming increasingly important for modern imaging applications. The findings of this research have been illustrated by comparing the actual JPEG compression results for YCbCr and CIELab spaces.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose an image coding scheme based on the polynomial transform and multiresolution analysis. The polynomial transform is an image representation model that mimics some properties of the human vidual system, and which we use in order to model edges in terms of their characteristic parameters. Based on the polynomial transform, we build a pyramidal hierarchical predictive scheme for image coding. The feature parameters that we encode are: local average, edge orientation, edge position and edge magnitude.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present different theoretical results linked with the Mojette transform that has been defined in VCIP '96. Direct and inverse transforms are extended to different implementations and the choice of the set of projections is deeply analyzed. Applications for image coding and image analysis are sketched to see the links between the Mojette transform with other tools: block/wavelet decomposition for coding and segmentation/texture for analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
2D fast cosine and sine transforms with regular structure are developed for 2n X 2n data points. These algorithms are extended versions of the 1D fast regular algorithms introduced in our recent paper. The rationale for these 2D algorithms for sine/cosine transforms in a 2D decomposition of data sequences into 2D subblocks with reduced dimension, rather than 1D, separable treatments for the columns and rows of the data sets. As a result the number of multiplications is 25 percent less than in row- column approach. Numerous algorithms of these type were proposed previously for discrete Fourier transform (DFT) and discrete cosine transform of type 2 (DCT-II). In DCT-II case the algorithms do not have a regular structure as is the case in DFT algorithms and motivation of this work is to derive 2D algorithms for discrete sine and cosine transforms with regular constant geometry structures. Extension to 2n X 2m data points is straightforward.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional objective metrics for the quality measure of coded images such as the mean squared error (MSE) and the peak signal-to-noise ratio do not correlate with the subjective human visual experiences well, since they do not take human perception into account. Quantification of artifacts resulted from lossy image compression techniques is studied based on a human visual systems (HVS) model and the time-space localization property of the wavelet transform is exploited to simulate HVS in this research. As a result of our research, a new image quality measure by using the wavelet basis function is proposed. This new metric works for a wide variety of compression artifacts. Experimental results are given to demonstrate that it is more consistent with human subjective ranking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer-aided diagnosis will be an important feature of the next generation picture archiving and communication systems. In this paper, computer-aided detection of microcalcifications in mammograms using a nonlinear subband decomposition and outlier labeling is examined. The mammogram image is first decomposed into subimages using a nonlinear subband decomposition filter bank. A suitably identified subimage is divided into overlapping square regions in which skewness and kurtosis as measures of the asymmetry and impulsiveness of the distribution are estimated. A region with high positive skewness and kurtosis is marked as a region of interest. Finally, an outlier labeling method is used to find the locations of microcalcifications in these regions. Simulation studies are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a method for selecting key frames by using a number of parameters extracted from the MPEG video stream. The parameters are directly extracted from the compressed video stream without decompression. A combination of these parameters are then used in a rule based decision system. The computational complexity for extracting the parameters and for key frame decision rule is very small. As a results, the overall operation is very quickly performed and this makes our algorithm handy for practical purposes. The experimental results show that this method can select the distinctive frames of video streams successfully.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multimedia data processing is becoming more and more a central concern among the people who have been working on image processing. Multimedia database retrieval is one of such a problem. A foreign language study assisting system is a good example for a multimedia data base design. Because each language depends on conversational situation such as topic and speech intention as well as place of conversation.In that case, we can not neglect the semantic aspect of multimedia information. The author's group has already proposed a semantic structure description form, called the SD-form, of the language meaning. They studied the feasibility of its application to natural language generation, story understanding, and conversational text retrieval systems. This paper presents our new attempt to expand our previous system from a text database systems to a multimedia database system which include motion picture, speech sound as well as language text. the source of the data in this project is a series of bilingual TV drama broadcasted in Japan. The most important point is this attempt is that each video scene is described by a set of SD-forms by which scenes are retrieved semantically.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Indexing and retrieval of image sequences are fundamental steps in video editing and film analysis. Correlation-based matching methods are known to be very expensive when used with large amounts of data. As the size of sequence database grows, traditional retrieval methods fail. Exhaustive search quickly breaks down as an efficient strategy for sequence databases. Moreover, traditional indexing with labels has a lot of drawbacks since it requires a human intervention. New advanced correlation filters are being proposed so as to decrease the computational load of the task. A new method for retrieval of images sequences in large database based on a spatio-temporal wavelet decomposition is proposed here. It will be shown how the use of the multiresolution approach can lead to good results in terms of computationally efficiency and robustness to noise. We will assume that the query sequence may not be contained in the database for different reasons: the presence of a noise signal on the query, or different digitation process, or the query is only similar to sequences in the database. As a consequence we are providing have developed a new efficient retrieval strategy that analyses the database in order to extract the most similar sequences to a given query. The wavelet transform has been chose as the framework to implement the multiresolution formalism, because of its good compression capabilities, especially for embedded schemes. And the good features it provides for signal analysis. This paper describes the principles of a multiresolution sequence matching strategy and outlines its performance through a series of experimental simulations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The MPEG-4 object-based coding standard, designed as a common platform for all multimedia applications, is inherently well-suited for video indexing applications. To fully exploit the advantages offered by MPEG-4, however, a reconsideration of existing indexing strategies is required. This paper proposes a new object-based framework for video indexing and retrieval that treats as the basic indexing unit the object itself, where changes in content are detected through observations made on the objects in the video sequence. We present a temporal segmentation algorithm that is designed to automatically extract key frames for each video object in an MPEG-4 compressed sequence based on the prediction model chosen by the encoder for individual macroblocks. An extension to the existing MPEG-4 syntax is presented for conducting and facilitating vast database searches. The data presented in the proposed 'indexing field' are: the birth and death frames of individual objects, global motion characteristics/camera operations observed in the scene, representative key frames that capture the major transformations each object undergoes, and the dominant motion characteristics of each object throughout its lifetime. We present the validity of the proposed scheme by results obtained on several MPEG-4 test sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce two texture classification techniques applicable to images compressed using block DCT. The first technique is a parametric approach. It models a texture as a stationary Gaussian process and utilizes the diagonalizing property of DCT. The second one uses the concept of power spectrum in the DCT domain. The energy distribution is employed to discriminate different textures. Both techniques work on compressed data without decoding and are designed to be robust against quantization noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Chroma-keying or blue screen matting is an important video editing operational. In blue screen matting, everything in the image that has the 'user specified level for the blue channel' is 'keyed' out and replaced by either another image or a color from a color generator. We develop a technique for blue screen matting that manipulates the image when the information for the image is available only in compressed form such as a JPEG or MPEG bitstream. Specifically, for the compressed domain approach, we show that the matting process is a convolution operation; hence, we develop a DCT convolution theorem. The DCT convolution theorem can be used to show that the compressed domain approach proposed in this paper provides a significant reduction in computation complexity compared to previously developed approaches which were mostly adhoc techniques. The convolution theorem exploits the sparseness as well as the orthogonality of the data available in the DCT domain and thus yields an efficient algorithm for he chroma-keying or blue screen matting. The algorithm extends to the DCT domain concepts such as alpha channel and premultiplied alpha. The method is also extended to MPEG video domain wherein we explore the efficiency of the matting process when interframe coding is used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image and video compression is becoming increasingly popular with the advent of broadband networks, high powered workstations, and compression standards. Visual data are therefore expected to be stored in the compressed format in future image and video databases. Recently, techniques for manipulating the DCT parameters have been proposed in the literature. However, these techniques are not directly applicable to MPEG video due to motion estimation/compensation procedure present in addition to DCT coding. In this paper, we propose a macroblock type selective encoder to implement scalable MPEG video coding using a combined DCT domain scaling operation and a modified DCT domain motion compensation technique. The proposed encoder removes the unnecessary decompression/re-compression and motion estimation procedures, and derives the motion information and DCT coefficients of the prediction errors for the scaled macroblocks for the original motion vectors and DCT coefficients, respectively. Hence, it provides an elegant way to re-encode the scaled video while maintaining a good video quality. In addition, it is compatible with the standard MPEG encoder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose scene decomposition algorithm from MPEG compressed video data. As a preprocessing for scene decomposition, partial reconstruction methods of DC image for P- and B-pictures as well as I-pictures directly from MPEG bitstream are used. As for detection algorithms, we have exploited several methods for detection of abrupt scene change, dissolve and wipe transitions using comparison of DC images between frames and coding information such as motion vectors. It is also proposed the method for exclusion of undesired detection such as flashlight in order to enhance scene change detection accuracy. It is shown that more than 95 percent of decomposition accuracy has been obtained in the experiment using more than one hour TV program. It is also found that in the proposed algorithm scene change detection can be performed more than 5 times faster than normal playback speed using 130MIPS workstation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an object-based synthetic-natural hybrid image coding scheme, where each image object is encoded individually, provided that their boundaries are specified. This allows coding natural and synthetic image objects using different methods which are best suited their content. It also allows object-based quality scalability, in addition to mixing lossy and lossless coding modes depending on the requirements of each image object. Furthermore, we propose a new object coding method using 2D mesh-based image sampling and interpolation, followed by encoding of the interpolation error image by a traditional data/waveform coding methods. Experimental results on synthetic-natural hybrid test images are provided.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new hybrid wavelet-fractal coder (WFC) for image compression is proposed in this research. We show that the application of contractive mapping for interscale wavelet prediction in the wavelet domain offers bit rate savings in some regions. The prediction residue is then quantized and encoded by traditional wavelet coders. WFC allows the flexibility to choose either direct coding of wavelet coefficients or fractal prediction followed by residual coding to achieve a better rate-distortion (R-D) performance. A criterion of low complexity is derived to evaluate the R-D efficiency of fractal prediction. The superior performance of WFC is demonstrated with extensive experimental results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce a new image compression framework that combines compression efficiency with speed, and is based on an independent infinite mixture model which accurately captures the space-frequency characterization of the wavelet image representation. Specifically, we model individual image wavelet coefficients as being drawn from an independent generalized Gaussian distribution (GGD) field of zero mean and unknown spatially-varying variances. Based on this model, we develop a powerful estimation-quantization (EQ) framework that consists of: (i) first finding the maximum- likelihood estimate of the individual spatially-varying coefficient field variances based on causal and quantized spatial neighborhood contexts; and (ii) then applying an off-line rate-distortion (R-D) optimized quantization/entropy coding strategy, implemented as a fast lookup table, that is optimally matched to the derived variance estimates. A distinctive feature of our framework is the dynamic partitioning of wavelet data into subsets representing coefficients that are 'predictable' and 'unpredictable' respectively from their quantized causal contexts. The statistical parameters of the 'unpredictable' set in each subband, obtained through a fast, R-D based, simple thresholding first-pass operation, represent the negligible parametric side-information for use in the forward adaptation mode. The combination of the powerful infinite mixture model, the dynamic switching between forward and backward adaptation modes, and the theoretical soundness and speed of the EQ framework lead to a novel, high-performing, and fast image coder that is extremely competitive with the best published coders in the literature across all classes of images and target bit rates of interest, in both compression efficiency and processing speed. For example, our coder exceeds the objective performance of the best zerotree-based wavelet coder at all bit rates for all tested images at a fraction of its complexity. At mow to medium bit rates our preliminary results appear to exceed all reported results in the wavelet image coding literature to the best of our knowledge.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fractal image compression has recently received considerable attention int he literature. In the previously published encoding techniques, an image is usually partitioned into nonoverlapping blocks, and each block is encoded by a self- affirm mapping from a larger block. The fractal code consists of the description of the image partition, along with that of the image transformation defined as the ordered list of block transformation: ((tau) i O <EQ i < N). Each of these block transformations are specified by a set of quantized parameters. With the help of experiment, we discovered the fact that there do exist blocks which have the same transformations are specified by a set of quantized parameters. With the help of experiment, we discovered the fact that there do exist blocks which have the same transformation parameters as the adjacent block transformations. In this paper we propose a region-based transformation that extends the block-based scheme. The concept of the cross search is defined and a search algorithm of finding the region transformations is also given. The results indicate that at the same signal to noise ratio, the region-based system achieves a higher compression ratio over the block-based scheme, and that our algorithm is fast than the block-based system because of less searching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a shape adaptive discrete wavelet transform (SA-DWT) scheme for coding arbitrarily shaped texture. The proposed SA-DWT can be used for object-oriented image coding. The number of coefficients after SA-DWT is identical to the number of pels contained in the arbitrarily shaped image objects. The locality property of wavelet transform and self-similarity among subbands are well preserved throughout this process.For a rectangular region, the SA-DWT is identical to a standard wavelet transform. With SA-DWT, conventional wavelet based coding schemes can be readily extended to the coding of arbitrarily shaped objects. The proposed shape adaptive wavelet transform is not unitary but the small energy increase is restricted at the boundary of objects in subbands. Two approaches of using the SA-DWT algorithm for object-oriented image and video coding are presented. One is to combine scalar SA-DWT with embedded zerotree wavelet (EZW) coding technique, the other is an extension of the normal vector wavelet coding (VWC) technique to arbitrarily shaped objects. Results of applying SA-VWC to real arbitrarily shaped texture coding are also given at the end of this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent success of linear subband coders is mainly attributed to the invention of data organization strategies exploiting the underlying dependency across subbands thus raising the PSNR 1-3 dB above transform coders. Nevertheless, they suffer from the ringing effect at low bit rates caused by linear filtering. In this paper, we propose a novel hierarchical image decomposition and quantization scheme which exploits the advantages of both linear and morphological filtering in eliminating the ringing effect and maintaining good texture representation. More importantly, the proposed morpho-subband decomposition scheme enables the incorporation of quantization into the decomposition. As a result, all existing algorithms to efficiently represent wavelet coefficients can be applied to representing the so-called morpho-subband coefficients as well. This increases the PSNR by 2-3 dB compared to existing morphological-linear subband decompositions reported in the literature.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
the problem of quanitizing sub-images of a multiresolution image decomposition while preserving edges is considered. For this purpose, we propose a coding algorithm which exploits both spatial and frequency location of wavelet coefficients within and across scales. This algorithm is dedicated to low bit rate image coding. In this paper, we develop a new constrained quantizer based on a lagrangian formulation called edge adaptive quantizer. Given a significance map, this algorithm preserves significant coefficients while smoothing elsewhere. This is done by introducing a spatial adaptation term based on Markov random field. A new criterion based on spatial models and entropy constraint is then derived. With this new formation, a practical solution to the multiresolution optimization problem is presented in the form of a bit allocation procedure. An optimal quantizer is constructed minimizing this new criterion for a target bit rate. Experiments using constraint quantization demonstrate PSNR gains over standard uniform scalar quantization and appreciable visual improvements. A simple extension of the algorithm allows for the use of other scalar quantizers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A fast rate-distortion (R-D) optimized wavelet packet (WP) transform is proposed for image compression in this research. By analyzing the R-D performance of the quantizer and the entropy coder, we show that the coding distortion D can be modeled as an exponentially decaying function as the coding rate R increases. With this exponential R-D model, it is proved that the constant R-D slope criterion for optimum bit allocation is equivalent to the constant distortion criterion, which can be easily implemented via thresholding. Based on this analytical result, we develop a fast wavelet packet decomposition scheme which is optimized in the R-D sense by comparing simple parameters associated with each wavelet packet band such as the 1st or 2nd absolute moments. We have performed extensive experiments to demonstrate the performance of an image coder using the proposed R-D optimized wavelet packet transform, and shown that our scheme is highly competitive with all well known state-of- the-art image coders.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The code generated by fractal coding of a digital image provides a resolution-independent representation of the image as this code can be decoded to generate a digital image at any resolution. When the image is decoded at a size larger than the original encoded image, image details beyond the resolution of the original image are predicted by assuming local self-similarity in image at different scales. In this paper, we (1) present a formulation of how decoding may be done at a higher resolution, (2) evaluate the accuracy of the predicted details using a frequency analysis of fractally enlarged test images, and (3) propose a method for fractal resolution enhancement without the low-frequency loss of information due to fractal coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wavelet transform which provides a multiresolution representation of images has been widely used in image and video compression. An investigation of wavelet decomposition reveals the cross-correlation among subimages at different resolutions. To exploit this cross-correlation, a new scheme using classified vector quantization to encode wavelet coefficients is proposed in this paper. The original image is first decomposed into a hierarchy of three layers containing ten subimages by discrete wavelet transform. The lowest resolution low frequency subimage is scalar quantized since it contains most of the energy of the wavelet coefficients. All high frequency subimages are vector quantized to utilize the cross-correlation among different resolutions. Vectors are constructed by combining the corresponding coefficients of the high frequency subimages of the same orientation at different resolutions. Classified vector quantization is used to reduce edge distortion and computational complexity. Computer simulations are carried out to evaluate the performance of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Poster Session on Motion Estimation for Video Coding II
The rate constrained block matching algorithm (RC-BMA), introduced in this paper jointly minimizes DFD variance and entropy or conditional entropy of motion vectors for determining the motion vectors in low rate video coding applications where the contribution of the motion vector rate to the overall coding rate might be significant. The motion vector rate versus DFD variance performance of RC-BMA employing size KxK blocks is shown to be superior to that of the conventional minimum distortion block matching algorithm (MD-BMA) employing size 2Kx2K blocks. Constraining of the entropy or conditional entropy of motion vectors in RC-BMA results in smoother and more organized motion vector fields with respect to those output by MD-BMA. The motion vector rate of RC-BMA can also be fine tuned to a desired level for each frame by adjusting a single parameter.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a novel method for estimating the 3D motion and dense structure of an object form its two 2D images. The proposed method is an iterative algorithm based on the theory of projections onto convex sets (POCS) that involves successive projections onto closed convex constraint sets. We seek a solution for the 3D motion and structure information that satisfies the following constraints: (i) rigid motion--the 3D motion parameters are the same for each point on the object. (ii) Smoothness of the structure--depth values of the neighboring points on the object vary smoothly. (iii) Temporal correspondence--the intensities in the given 2D images match under the 3D motion and structure parameters. We mathematically derive the projection operators onto these sets and discuss the convergence properties of successive projections. Experimental results show that the proposed method significantly improves the initial motion and structure estimates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In block-based video coding, the current frame to be encoded is decomposed into blocks of the same size, and a motion vector is used to improve the prediction for each block. The motion vectors and the difference frame, which contains the blocks' prediction errors, must be encoded with bits. Typically, choosing a smaller block size will improve the prediction and hence decrease the number of difference frame bits, but it will increase the number of motion bits since more motion vectors need to be encoded. Not surprisingly, there must be some value for the block size that optimizes the tradeoff between motion and difference frame bits, and thus minimizes their sum. Despite the widespread experience with block-based video coders, there is little analysis or theory that quantitatively explains the effect of block size on encoding bit rate, and ordinarily the block size for a coder is chosen based on empirical experiments on video sequences of interest. In this work, we derive a procedure to determine the optimal block size that minimizes the encoding rate for a typical block-based video coder. To do this, we analytically model the effect of block size and derive expressions for the encoding rates for both motion vectors and difference frames, as functions of block size. Minimizing these expressions leads to a simple formula that indicates how to choose the block size in these types of coders. This formula also shows that the best block size is a function of the accuracy with which the motion vectors are encoded and several parameters related to key characteristics of the video scene,such as image texture, motion activity, interframe noise, and coding distortion. We implement the video coder and use our analysis to optimize and explain its performance on real video frames.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Global motion estimation and compensation are important research issues in video compression. The main difficulty in global motion estimation resides in the disturbance of independently moving objects. The algorithm presented in this paper exploits global motion information not only form stationary objects and the image background, but also from independently moving objects. Simulation results show that he new algorithm is more robust to the disturbance of independently moving objects, and computationally faster than an algorithm based on least-square approximation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present an algorithm to segment image sequences form motion information. A dense vector filed estimated by a Wiener-based pel-recursive method represents the key to separate a viewed scene into regions with different apparent displacement, according to a four-parameter motion model. A preprocessing stage using mathematical morphology enhances pel-recursive motion estimation. The proposed segmentation model, based on Markov Random Fields theory , considers-- besides the motion field--other information sources that help describe the problem more accurately. The maximum a posteriori criterion is used for the optimization of the solution, and performed with a deterministic approach. The complete segmentation algorithm includes initializing, region numbering and labeling, parameter estimation of the motion model in each region, and optimization of the segmentation field. Results on synthetic and real sequences are shown.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have previously developed an optical-flow-based motion estimator that produces dense, spatially-coherent motion fields under bit-rate constraints. These motion estimates target video coding applications, including post-processing applications in which no additional motion estimation step at the decoder is required. In frame-interpolation applications, significant improvements have been obtained over methods that rely upon the standard block-matching algorithm for motion estimation. We now extend these ideas to the case of video coders that use bidirectionally predicted B frames; the use of B frames provides temporal scalability and good compression performance. We develop a novel scheme to handle the problems caused by the presence of covered/uncovered regions. The scheme uses a label field to optimally weight the contributions of the forward and backward predictions. The label field is dense, with label values in the range (0, 1); we introduce a multiscale algorithm for jointly estimating and compressing the label field. In coding experiments on the Susie sequence, the use of label fields resulted in substantial visual and PSNR gains, especially in the fast moving parts of the sequence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital transmission of video signals and block-based coding/decoding schemes produce new artifacts such as blocking, dirty window, ringing and mosquite effects. These artifacts become worse with decreasing MPEG-2 data rates. Therefore the reduction of MPEG-artifacts becomes an attractive feature for digital TV-receivers. On the other hand an important feature for digital receivers is the performance of their postprocessing techniques such as object recognition, motion estimation, vector-based upconversion and noise reduction on MPEG-signals which are decoded in a receiver-based module called 'set top box'. In this paper different models dealing with the interaction between 'set top box' and digital receiver are discussed. Hereby the influence of MPEG-artifacts on postprocessing are presented. A vector-based upconversion algorithm which applied nonlinear center weighted median filters is presented. Assuming a 2-channel model of the human visual system with different spatio temporal characteristics, errors of the separated channels can be orthogonalized and avoided by an adequate splitting of the spectrum. Hereby a very robust vector error tolerant upconversion method which significantly improves the interpopulation quality is achieved. This paper describes also a concept for temporal recursive noise and MPEG-artifact filtering on TV images based on visual noise perception characteristics. Different procedures in the spatial subbands lead to results well matched to the requirements of the human visual system. Using a subband-based noise filter temporally non-correlated MPEG-artifacts can significantly be reduced. Image analysis using object recognition for video postprocessing becomes more important. Therefore a morphological, contour-based multilevel object recognition method which even stays robust in strongly corrupted MPEG-2 images is also introduced.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this article, we present a new NTSC system based on multidimensional crosstalk-free transmultiplexer theory. The system is truly crosstalk-free and is compatible with existing television sets. The new encoded NTSC composite signal can be demodulated with slight degradation by a conventional television receiver, with some improvements by a comb-filter-equipped television and with best performance with the new decoder. We sue new sampling structures for luminance and chrominance signals in order to obtain near- perfect-reconstruction. The detailed structure of the proposed crosstalk-free NTSC system is presented. The NTSC encoder is composed of a decimation stage followed by a near-perfect-reconstruction transmultiplexer encoder. The NTSC encoder is composed of a decimation stage followed by a near-perfect-reconstruction transmultiplexer encoder. The NTSC decoder is composed of the transmultiplexer's decoder followed by an interpolation stage. We show structures of multidimensional two-channel FIR filter banks which allow near-perfect-reconstruction. Such structures lead to exactly zero crosstalk between luminance and chrominance signals. Special attention is given to the design of the filters in the system since they need to maintain compatibility with existing receivers. A design example and comparison with existing NTSC systems are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video transmission over wireless links is an emerging application which involves a time-varying channel. Compared to other transmission media, wireless links suffer from limited bandwidth, and are more likely to see their performance degrade due to multipath fading. Therefore error control mechanisms, which can achieve better video quality with the available bandwidth and recover from the errors due to the channel degradation, are very important in wireless video transmission systems. Many of the proposed wireless communications systems are likely to be two-way so that a return channel can convey information to the transmitter about the channel state. Recent research has considered ways of improving the transmission reliability by making use of the feedback channel for 'closed loop' error control, including various forms of retransmission. In this paper we propose a rate control algorithm based on dynamic programming combined with automatic repeat request (ARQ) as the error control mechanism. We formalize the constraints imposed by the real-time characteristics of video. We show how when an appropriate model for the channel is available, the overall robustness of the systems can be improved through rate control at the source using the channel state information conveyed by the ARQ acknowledgement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In japan, with the start of the personal handy phone system service in 1995, there has been increased activity aimed at developing on mobile-multimedia communication services that take advantage of the system's high 32-kbps speed. Since wireless transmission paths suffer from fading errors, an important task in the implementation of mobile-multimedia communication services will be to guarantee high-efficiency communication despite the error-laden transmission channel. We present a study of a highly error-tolerant re- transmission-control method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The asynchronous transfer mode (ATM) appears as the standard protocol for image and video transmission. There is virtually no bandwidth limitations neither a restricted size of operating area. However, the main problem stands in the non-secured transmission when ATM native applications are implemented. This induced a new way of encoding images where the redundancy is generated into the CoDec. In this paper, we present the Mojette transform that generates the redundancy at the higher level of the coder in order to safely transmit image data. BLock and wavelet implementations associated with the Mojette transform are presented and compared not only from the coder point of view but for the source and the channel characteristics. For this specific case we also present the asynchronous Mojette reconstruction. An adapted object oriented model has been developed accordingly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an effective and simple video decoding and simple video decoding method in H.263 over noisy channels without use of any kind of feedback information. We address that through the computer simulations the negotiable options of H.263 are of not use for error-robustness against error-prone channel, and propose one of easy error detection/correction methods. The key idea is that sacrifice of coding efficiency by restricting some functionalities of H.263 reduces the number of a set of possible administrative information, which allows a decoder to easily check whether or not the desirable syntax has been obtained and to try syntax correction. Applying this method, for example, syntax errors such as 'MCBPC' errors can be easily detected and corrected, which are crucial for removing visually annoying 'green/pink' block artifacts completely. Since the proposed method is completely H/263-compatible, the decoder is very good at finding H.263 syntax errors caused by channel errors. The channel model we have chosen was DECT2 with bit- error-rate of 2.1 X 10-3.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mobile communication channels are subject to multipath fading, which in turn results in burst errors occurring in the transmitted bit stream. When the data is source coded video data, such errors will cause a rapid deterioration of the decoded image sequence. This paper presents a method for reducing the effects of such errors on the H.263 coded bit stream using rate compatible punctured convolutional (RCPC) codes for unequal error protection and describes some modifications made to the source decoder for error concealment. Numerical results are obtained for RCPC codes over Gaussian and Rayleigh fading channels and they are compared with a sequence coded using a burst error correcting block code RS(15,11) and one without any form of channel coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the issue of introducing information loss in a video sequence prior to coding with the objective of maintaining visual quality while lowering the bit rate required to code the video. We argue that managing perceptually meaningful information if better accomplished in the motion domain rather than the pixel intensity domain. Based on recent advances in motion estimation and modeling, we introduce a representation of video data using motion information wherein every frame of a sequence is replaced with a field of interframe correspondences. Knowledge of the previous frame along with the filed of interframe correspondences is sufficient to reconstruct the current frame. Preprocessing is accomplished by filtering interframe correspondences and generating processed video using the filtered correspondence field. Simulations using the MPEG video coding standard reveal that rate gains of up to 20 percent are possible when filtering such that the processed and original video are of similar visual quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Considering a video signal which has ben degraded by a PAL codec and a noisy transmission on a satellite channel, tow digital filtering schemes are presented for reducing the different resulting signal distortions before its MPEG2 encoding. The first one deals with crossing-effect and transmission noise using a single non-linear filtering box, while the second scheme consists of several linear processing, each one devoted to one kind of distortion. Both filtering approaches are discussed, and their performances before and after the MPEG2 codec are compared, using objective, peak signal-to-noise ration, as well as subjective, visual quality, criteria.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Block-based transform coding employing the discrete cosine transform (DCT) is a popular technique in image and video compression. We consider Wiener-filter-based restoration for transform-coded images and motion video. The scheme operates at the decoder end. It capitalizes on the residual correlation among quantized DCT coefficients and the quantization errors. The scheme, termed error pattern compensation or EPC in short, is derived by simplifying and extending related work of other researchers. When applied to motion video, it is activated for intraframe-coded macroblocks only. It first classifies an encoded image block according to some visually meaningful features and vector quantization. Different Wiener filters can be designed for different classes of input blocks. Experimental results show that the scheme yields performance improvement in the range of a few percent in MSE or PSNR. It is more effective in restoring image blocks with greater quantizing distortion. Results also show that intraframe-coded pictures usually benefit the most from EPC, followed by predictive-coded pictures and then by bidirectional-predictive-coded pictures. At times, the last two types of pictures show performance degradation rather than gain. Possible reasons are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A global brightness-fluctuation compensation (GBC) scheme is proposed to improve coding efficiency for video scenes that contain global brightness fluctuations caused by fade- in/out, camera-iris adjustment, and so on. In this scheme, a set of two brightness-fluctuation parameters, which represent contrast and offset components of the brightness fluctuation in the whole frame, is first estimated. The brightness fluctuation is then compensated using the parameters. Furthermore, a block-by-clock ON/OFF control method for GBC is introduced to improve coding performance even for scenes including local fluctuations in brightness caused by camera flashed, spotlights, and the like. In this method, GBC is performed only on the blocks where the sum of the squared error produced using GBC is less than that produced without GBC. Simulation results show that the proposed GBC scheme with the ON/OFF control method greatly improves coding efficiency, especially for sequences with considerable global brightness fluctuation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the paper, we propose a novel method of arbitrarily focused image acquisition using multiple differently focused images. First, we describe our previous select-and-merge method for all-focused image acquisition. We can get god results by using this method but it's not easy to extend this method for generating arbitrarily focused images. Then, based on the assumption that depth of the scene changes stepwise, we derive a formula for reconstruction between the desired arbitrarily focused image and multiple acquired images; we can reconstruct the arbitrarily focused image by iterative use of the formula. We also introduce coarse-to- fine estimation of PSFs of the acquired images. We show we can reconstruct arbitrarily focused images for a natural scene. In other words, we can simulate virtual cameras and synthesize images focused on arbitrarily depths.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces an iterative regularized approach to obtain a high resolution video sequence. A multiple input smoothing convex functional is defined and used to obtain a globally optimal high resolution video sequence. A mathematical model of multiple inputs is described by using the point spread function between the original and bilinearly interpolated images in the spatial domain, and motion estimation between frames in the temporal domain. Properties of the proposed smoothing convex functional are analyzed. An iterative algorithm is utilized for obtaining a solution. The regularization parameter is updated at each iteration step from the partially restored video sequence. Experimental results demonstrate the capability of the proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multiframe resolution enhancement algorithms are used to estimate a high-resolution video still (HRVS) from several low-resolution frames, provided that objects within the image sequence move with subpixel increments. A Bayesian multiframe enhancement algorithm is presented to compute an HRVS using the spatial information present within each frame as well as the temporal information present due to object motion between frames. However, the required subpixel- resolution motion vectors must be estimated from low- resolution and noisy video frames, resulting in an inaccurate motion field which can adversely impact the quality of the enhanced image. Several subpixel motion estimation techniques are incorporated into the Bayesian multiframe enhancement algorithm to determine their efficacy in the presence of global data transformations between frames and independent object motion. Visual and quantitative comparisons of the resulting high-resolution video stills computed from two video frames and the corresponding estimated motion fields show that the eight- parameter projective motion model is appropriate for global scene changes, while block matching and Horn-Schunck optical flow estimation each have their own advantages and disadvantages when used to estimate independent object motion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a new methodology to deal with videoconference applications in which several different sties can be involved. In such applications, it should be interesting for each user to watch only one image which gives him the impression that everybody is in the same virtual room. Furthermore, since it can be expected that only a very limited transmission bandwidth is available, it is important to transmit only useful information. For these reasons, we have developed a technique which consists in the creation of a hybrid synthetic/natural scene. This hybrid scene contains the real images of each interlocutor of the multi-sites videoconference. This permits to reduce the bitrate since only the regions of interest contained in the real video data must be coded and transmitted. In practice, the background of the scene, which has generally no interest for users, is not coded. The extraction of these regions of interest is performed of the scene, which has generally no interest for users, is not coded. The extraction of these regions of interest is performed by a new detection algorithm based on a reference image. In order to manage occlusion and collision problems in the hybrid images, a 3D positioning strategy of 2D real objects has been developed. Experimental results are presented on real videoconference- like image sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we propose a method for adaptive quantization of motion compensation residuals, in an H.263 coding scheme, based on color information. In the proposed strategy, the perceptive distance between the original and the predicted version of each macroblock is evaluated in the perceptive uniform color space L, a, b. Exploiting the properties of the perceptive uniform color spaces, the color distance is easily evaluated by means of the Euclidean distance, and its average value is compared with a threshold in order to choose the suitable quantization. Simulation results show that the adaptive quantization is very efficient in reducing the bit-rate when the sequences exhibit slow regular motion, and the quantization performances are upper-bounded by the motion compensation performances. In fact, for sequences with low-medium amount of motion, the block-based motion compensation efficiently predicts the actual frame from adjacent ones and the residuals are due to noise, or to small color variations. Then, in these cases, the adaptive quantization can provide significant bit-rate reduction, without subjective quality degradation. The proposed strategy is strictly compatible with the H.263 coding standard; however it is quite general, and can be useful exploited in different coding frameworks, based on various motion compensation techniques, whenever motion compensation residuals are evaluated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One efficient way to compress digital images is subband coding. Subband coding using vector quantization could be a competitor to DCT-like image compression schemes. In this paper we will describe an image sequence compression algorithm based on difference image coding techniques, with block motion compensation, difference image segmentation in rectangles using quadtrees, decomposition of rectangles in subbands and vector quantization of the subbands. The vector quantization scheme uses multiple vector quantizers, which yields a better bitrate allocation. The quantization of each subband is performed by 3 different tree structured vector quantizers (TSVQ) at variable tree depths. The rate- distortion curves of all the rectangles are scanned to get the best global R-D combination. The best combination parameters are coded and use to quantize the subbands of all the rectangles. The results show a slightly better performance of the this scheme in relation to the optimal scalar quantization of subbands. The coding speed of this VQ scheme is only 3 times slower than 1 single vector quantization per vector.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A bottom-up approach to predict wavelets with implicit optical flow is introduced. The presented method is composed of the recursive application of a pel-recursive approach and a wavelet transform. It is compared with a top-down approach theoretically and the existing block motion approach (BMA) with experiments. The presented method shows better performance than BMA for the real image sequence with small motion. It also results in smaller prediction error for the image sequence even with large motion with a prior motion information such as the optical flow of the previous frame or the prediction frame of BMA.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We consider optimal encoding of a sequence of video units under a given set of rate constraints which may arise from finite codec delay, finite channel capacity, and finite codec buffer sizes. A Lagrange-multiplier approach is employed and some useful properties of the optimal Lagrange- multiplier solution are obtained under the assumption that the allowed video data rates are continuous. Based on these properties, we derive two solution algorithms for discrete allocation. The algorithms are more efficient than that have been presented to date. The solution is optimal when the distortion-rate relations of the video units are convex and the selectable rates of the video units are uniformly spaced with the same granularity. When these conditions do not hold, the Lagrange-multiplier solution may be suboptimal, but can be improved or optimized by a search about the solution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rate distortion based bit allocation algorithms were previously proposed to yield minimum distortion for a given bit rate in the framework of MPEG. However, they are impractical due to the huge computation required to generate the rate-distortion curve for each image block. In this paper, we propose a fast piecewise linear approximation of the rate distortion function that makes rate-distortion based bit allocation close to practical. By using the proposed fast recursive algorithm to compute selected points on the rate-distortion function and then apply linear interpolation, we show that the computation can be reduced by a factor of approximately 17. Simulation is performed in which rate distortion based bit allocation using bisection approach is applied to an MPEG-1 coder. A significant gain of 1.15dB in PSNR is found to be possible. But the proposed fast algorithm can only achieve a PSNR gain of 0.64dB suggesting that further work is needed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multigeneration is the repeated compression-decompression of images. Under a compression scheme like transform coding, reconstructed images suffer further degradation at each generation, even though there is no manipulation of the image data. This paper describes multigeneration characteristics of transform coded images. We also study motion compensated coding in a multigeneration environment. We present five mechanisms that contribute to the continued degradation in multigeneration: pixel domain quantization (PDQ), pixel domain clipping (PDC), compression control parameters variation (CCPV), motion vector re-estimation (MVR) and error propagation due to motion compensation (EPMC). For PDQ, we show that it is not the step seize of the DCT domain or the pixel domain quantizers that contributes to the saturation of degradation in multigeneration, but instead the ratio between these tow quantizers. We observe that PDC mainly affects the DC coefficient. In CCPV and MVR, multigeneration error is reduced when the quantization scaling parameters and/or the motion vectors of the first generation are used for each subsequent generation. For EPMC, multigeneration errors in reference pictures propagate to frames that are predicted form them. COnsequently, EPMC only multiplies the effect of other mechanisms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a simple bitrate control method to percent the abrupt quality degradation after scene change. After scene change, the quality degradation occurs due to the poor temporal prediction between pictures before and after scene change. We predict the coding complexity of picture using the spatial variance before DCT and spectral flatness measure. From the predicted coding complexity, we show that the rate-distortion relation of image can be approximated to exponential function. When scene changes, picture target bit is adjusted in the direction to minimize the distortion in a GOP using the rate-distortion relations for each P-picture. Since the bit shortage could be occurred, proposed method extends the current GOP to the next. The algorithm can be applied to the existing MPEG codecs and real-time applications easily. Compared with the MPEG-2 TM5 rate control algorithm, proposed algorithm shows 0.5 to approximately 2.5 dB gain in PSNR and a small fluctuation in quality after scene change.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a solution for image partition encoding, usable for region-based image or video compression. This solution consists of several elements which can eventually be used separately in other contexts such as the recent VOP contour encoding defined within the ISO-MPEG-4 standardization effort. Namely, these elements are the graph representation of the contour network topology on the one hand, and the B-spline approximation of the contour geometry, approximation carried out on a purely geometric criterion on the other hand. The role of the graph representation is two-fold; to enable the regeneration of region labels at the decoder side, without needing to send them along with the contour geometry, which could be jeopardized otherwise by contour approximation errors, and to save the coding of the contours starting points. Such label regeneration is based on the extraction of the list of consecutive arcs corresponding to the external region boundary only from the graph structure. The algorithm for such an extraction is given as well as the encoding of the graph structure. Compression of the geometric information is obtained through contour approximation by B-spline curves. Such an approximation combines a least mean squares curve fitting and a gradient-based geometric curve evolution from this first approximation. Finally, a solution is proposed for the encoding of the resulting B-spline control points.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most software in the movie and broadcasting industries are still in analog film or tape format, which typically contains random noise that originated from film, CCD camera, and tape recording. The performance of the MPEG-2 encoder may be significantly degraded by the noise. It is also affected by the scene type that includes spatial and temporal activity. The statistical property of noise originating from camera and tape player is analyzed and the models for the two types of noise are developed. The relationship between the noise, the scene type, and encoder statistics of a number of MPEG-2 parameters such as motion vector magnitude, prediction error, and quant scale are discussed. This analysis is intended to be a tool for designing robust MPEG encoding algorithms such as preprocessing and rate control.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A novel forward rate control (FRC) strategy based on the rate distortion theory is proposed in this paper. The new FRC allocates the target number of bits for each macroblock according to the local spatial activity as well as the communication channel bitrate, so that an optimum bit allocation can be obtained from the modeled image data and from the buffer fullness. The resulted subjective quality test for flower garden and football sequences show that the proposed FRC strategy improves the quality of the reconstructed pictures considerably over the conventional one. The tests based on the objective criterion gives out an averaging one dB PSNR gain over the TM5 buffer controlling strategy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.