This recording is for the presentation titled, “Speeding up sparse signal recovery using sparse-graph codes”, part of the SPIE symposium on “Optical Engineering + Applications"
We design a distributed multi-channel P2P Video-on-Demand (VoD) system using "plug-and-play" helpers.
Helpers are heterogenous "micro-servers" with limited storage, bandwidth and number of users they can serve
simultaneously. Our proposed system has the following salient features: (1) it jointly optimizes over helper-user
connection topology, video storage distribution and transmission bandwidth allocation; (2) it minimizes server
load, and is adaptable to varying supply and demand patterns across multiple video channels irrespective of video
popularity; and (3) it is fully distributed and requires little or no maintenance overhead. The combinatorial nature
of the problem and the system demand for distributed algorithms makes the problem uniquely challenging. By
utilizing Lagrangian decomposition and Markov chain approximation based arguments, we address this challenge
by designing two distributed algorithms running in tandem: a primal-dual storage and bandwidth allocation
algorithm and a "soft-worst-neighbor-choking" topology-building algorithm. Our scheme provably converges to
a near-optimal solution, and is easy to implement in practice. Packet-level simulation results show that the
proposed scheme achieves minimum sever load under highly heterogeneous combinations of supply and demand
patterns, and is robust to system dynamics of user/helper churn, user/helper asynchrony, and random delays in
the network.
Bandwidth-efficient video file synchronization between remote clients is an important task. When heterogeneous mobile clients want to synchronize their local video data to that of a remote party at a desired resolution and distortion level, it is wasteful and unnecessary to retransmit the entire video data, especially when the differences are minor while the clients are limited in transmission bandwidth. We present VSYNC (video-sync), an incremental video file synchronization protocol that automatically detects and transmits differences between the video files without prior knowledge of what is different. VSYNC generalizes the popular universal file synchronization tool rsync to a semantics-aware utility that handles synchronization of video data. An important attribute of VSYNC is that it allows synchronization to within some quantitative distortion constraint. VSYNC can be easily embedded in a codec or transcoder, and can be used to synchronize videos encoded with different parameters or stored in different, possibly proprietary, formats. A hierarchical hashing scheme is designed to compare the video content at the remote ends, while a lossy distributed video coding framework is used to realize compression gains in the update steps. Experimental results of three heterogeneous mobile clients synchronizing to an updated video file at the remote server validate the performance gains in rate-savings attained by VSYNC compared to directly sending the updated video files using H.26x or synchronizing using universal file synchronization protocols such as rsync.
We consider the problem of communicating compact descriptors for the purpose of establishing visual correspondences
between two cameras operating under rate constraints. Establishing visual correspondences is a critical
step before other tasks such as camera calibration or object recognition can be performed in a network of cameras.
We verify that descriptors of regions which are in correspondence are highly correlated, and propose the use
of distributed source coding to reduce the bandwidth needed for transmitting descriptors required to establish
correspondence. Our experiments demonstrate that the proposed scheme is able to provide compression gains of
57% with minimal loss in the number of correctly established correspondences compared to a scheme that communicates
the entire image of the scene losslessly in compressed form. Over a wide range of rates, the proposed
scheme also provides superior performance when compared to simply transmitting all the feature descriptors.
KEYWORDS: Computer programming, Video coding, Video, Data modeling, Video compression, Quantization, Binary data, Motion estimation, Data compression, Motion models
Recently, a new class of distributed source coding (DSC) based video coders has been proposed to enable low-complexity
encoding. However, to date, these low-complexity DSC-based video encoders have been unable to
compress as efficiently as motion-compensated predictive coding based video codecs, such as H.264/AVC, due to
insufficiently accurate modeling of video data. In this work, we examine achieving H.264-like high compression
efficiency with a DSC-based approach without the encoding complexity constraint. The success of H.264/AVC
highlights the importance of accurately modeling the highly non-stationary video data through fine-granularity
motion estimation. This motivates us to deviate from the popular approach of approaching the Wyner-Ziv
bound with sophisticated capacity-achieving channel codes that require long block lengths and high decoding
complexity, and instead focus on accurately modeling video data. Such a DSC-based, compression-centric encoder
is an important first step towards building a robust DSC-based video coding framework.
We propose a novel method of exploiting inter-view correlation among cameras that have overlapping views in order to deliver error-resilient video in a distributed multi-camera system. The main focus in this work is on robustness which is imminently needed in a wireless setting. Our system has low encoding complexity, is robust while satisfying tight latency constraints, and requires no inter-sensor communication. In this work, we build on and generalize PRISM [Puri2002], an earlier proposed single-camera distributed video compression system. Specifically, decoder motion search, a key attribute of single-camera PRISM, is extended to the multi-view setting to include decoder disparity search based on two-view camera geometry. Our proposed system, dubbed PRISM-MC (PRISM multi-camera), achieved PSNR gains of up to 1.7 dB over a PRISM based simulcast solution in experiments over a wireless channel simulator.
A subspace-based method for denoising with a frame works as follows:
If a signal is known to have a sparse representation with respect to
the frame, the signal can be estimated from a noise-corrupted observation of the signal by finding the best sparse approximation to the observation. The ability to remove noise in this manner depends on the frame being designed to efficiently represent the signal while it inefficiently represents the noise. This paper gives bounds to show how inefficiently white Gaussian noise is represented by sparse linear combinations of frame vectors. The bounds hold for any frame so they are generally loose for frames designed to represent structured signals. Nevertheless, the bounds can be combined with knowledge of the approximation efficiency of a given family of frames for a given signal class to study the merit of frame redundancy for denoising.
Wavelet thresholding is a powerful tool for denoising images and
other signals with sharp discontinuities. Using different wavelet
bases gives different results, and since the wavelet transform is
not time-invariant, thresholding various shifts of the signal is
one way to use different wavelet bases. This paper describes
several denoising methods that apply wavelet thresholding or
variations on wavelet thresholding recursively. (We previously
termed one of these methods "recursive cycle spinning.") These methods are compared experimentally for denoising piecewise
polynomial signals. Though similar, the methods differ in
computational complexity, convergence speed, and sensitivity to
threshold selection.
KEYWORDS: Forward error correction, Data hiding, Computer programming, Quantization, Signal to noise ratio, Digital watermarking, Distortion, Data compression, Composites, Bridges
It has recently been discovered that many current applications such as data hiding and watermarking can be posed as the problem of channel coding with side information. As a result there has been considerable interest in designing codes to try and attain the theoretical capacity of the problem. It was shown by Pradhan et. al that in order to achieve capacity, a powerful channel codebook that partitions into a powerful source codebook should be chosen. The data to be embedded will index the source codebook partition. The constructions that exist in the literature, however, are typically based on powerful channel codebooks and weak source codebook partitions and hence remain at a considerable gap to capacity. In this paper, we present several methods of construction that are based on a powerful channel codebook (i.e. turbo codes) and powerful source codebook partitions (i.e., trellis coded quantization) to try and bridge the gap to capacity. For the Gaussian channel coding with side information (CCSI) problem at a transmission rate of 1 bit/channel use, our proposed approach comes within 2.72 dB of the information-theoretic capacity.
Wavelet transform-based methods are currently used in a variety of image and video processing applications and are popular candidates for future image and video processing standards. Very little, however, has been done to develop efficient and simple stochastic models for wavelet image data. In this paper we review some existing modeling approaches for wavelet image data. Inspired by our recent estimation-quantization image coder, we introduce an efficient graphical stochastic model for wavelet image coefficients. Specifically, we propose to model wavelet image coefficients as Gaussian random variables with parameters determined by an underlying hidden Markov-type process. This stochastic model is defined using a factor graph framework. We test our model for denoising images corrupted by additive Gaussian noise. Our results are among the state-of-the-art in the field and they indicate the promise of the proposed model.
This paper derives bounds on the performance of statistical object recognition systems, wherein an image of a target is observed by a remote sensor. Detection and recognition problems are modeled as composite hypothesis testing problems involving nuisance parameters. We develop information- theoretic performance bounds on target recognition based on statistical models for sensors and data, and examine conditions under which these bounds are tight. In particular, we examine the validity of asymptotic approximations to probability of error in such imaging problems. Applications to target recognition based on compressed sensor image data are given. This study provides a systematic and computationally attractive framework for analytically characterizing target recognition performance under complicated, non-Gaussian models, and optimizing system parameters.
This work introduces a multiple-description product code which aims at optimally generating multiple, equally-important wavelet image descriptions. The codes used are a concatenated channel code including a row (outer) code based on RCPC codes with CRC error detection and a source-channel column (inner) code consisting of the scalable SPIHT image coder and an optimized array of unequal protection Reed-Solomon erasure- correction codes. By systematically matching the unequal protection codes to the embedded source bitstream using a simple, fast optimizer that can run in real time, we allow image quality to degrade gracefully as fade worsens and maximize expected image quality at the receiver. This approach to image transmission over fading channels offers significant improvements in both peak and expected image quality when compared to current state-of-the-art techniques. Our packetization scheme is also naturally suited for hybrid packet-network/wireless channels, such as those used for wireless Internet access.
Inspired by a recently proposed constructive framework for the distributed source coding problem, we propose a powerful constructive approach to the watermarking problem, emphasizing the dual roles of 'source codes' and 'channel codes.' In our framework, we explore various source and channel codes to achieve watermarks that are robust to attackers in terms of maximizing the distortion between the corrupted coded-source signal and the original signal while holding the distortion between the coded-source signal and the original signal constant. We solve the resulting combinatorial optimization problem using an original technique based on robust optimization and convex programming.
In this work, we introduce a hidden Markov field model for wavelet image coefficients within a subband and apply it to the image denoising problem. Specifically, we propose to model wavelet image coefficients within subbands as Gaussian random variables with parameters determined by the underlying hidden Markov process. Our model is inspired by the recent Estimation-Quantization (EQ) image coder and its excellent performance in compression. To reduce the computational complexity we apply a novel factor graph framework to combine two 1-D hidden Markov chain models to approximate a hidden Markov Random field (HMRF) model. We then apply the proposed models for wavelet image coefficients to perform an approximate Minimum Mean Square Error (MMSE) estimation procedure to restore an image corrupted by an additive white Gaussian noise. Our results are among the state-of-the-art in the field and they indicate the promise of the proposed modeling techniques.
Recently a constructive practical framework for the problem of source coding with side information was proposed. In this work we address the problem of denoising of images with side information at the decoder. We propose this with the addition of a side digital channel and decode the digital bits with the help of a noisy image (side information) at the decoder. The encoder needs to compress the image using the knowledge that the decoder has access to a noisy version of the image. We propose a rate allocation technique to optimally allocate the rate among the wavelet coefficients of the image. With a rate of transmission of 0.8175 bits/source sample, we get a gain of 1.665 dB over the conventional source coding techniques. This is achieved by modifying only 9% of the wavelet coefficients in the conventional source coder.
This paper addresses optimizing cache allocation in a distributed image database system over computer networks. We consider progressive image file formats, and `soft' caching strategies, in which each image is allocated a variable amount of cache memory, in an effort to minimize the expected image transmission delay time. A simple and efficient optimization algorithm is proposed, and is generalized to include multiple proxies in a network scenario. With optimality proven, our algorithms are surprisingly simple, and are based on sorting the images according to a special priority index. We also present an adaptive cache allocation/replacement strategy that can be incorporated into web browsers with little computational overhead. Simulation results are presented.
In this paper, we address the issue of designing a two- channel linear phase biorthogonal filter bank that maximizes the two most desired properties for the wavelet transform in image coding applications, namely, orthogonality and energy compaction. Proper cost functions are formulated for these two criteria and an efficient signal-adaptive optimization algorithm is proposed. Our algorithm is motivated by a number of interesting properties of the correlation matrix of typical image signals, and uses lifting operations to efficiently represent the degrees of freedom subject to perfect reconstruction conditions. In addition, it offers a successive tradeoff between our two optimization goals. Experimental results on the popular Daubechies 9-7 and 10-18 filter banks reveal that considerable improvements in terms of both orthogonality and energy compaction can be achieved through the proposed optimization technique.
We study the performance difference of the discrete cosine transform (DCT) and the wavelet transform for both image and video coding, while comparing the other aspects of the coding system on an equal footing. Based on the state-of-the-art coding techniques, we point out that, for still images, the performance gap between DCT and wavelet based coding is within one dB in PSNR at the same bitrate. This is in contrast to the common perception that the wavelet transform is much superior to the DCT for image compression. For video coding, the advantage of using the wavelet transform over the DCT is even less pronounced.
This paper introduces a new approach to inverse halftoning using nonorthogonal wavelets. The distinct features of this wavelet-based approach are: a) edge information in the highpass wavelet images of a halftone is extracted and used to assist inverse halftoning, b) cross-scale correlations in the multiscale wavelet decomposition are used for removing background halftoning noise while preserving important edges in the wavelet lowpass image, c) experiments show that our simple wavelet-based approach outperforms the best results obtained from inverse halftoning methods published in the literature, which are iterative in nature.
We address efficient context modeling in arithmetic coding for wavelet image compression. Quantized highpass wavelet coefficients are first mapped into a binary source, followed by high order context modeling in arithmetic coding. A blending technique is used to combine results of context modeling of different orders into a single probability estimate. Experiments show that an arithmetic coder with efficient context modeling is capable of achieving a 10 percent bitrate saving over a zeroth order adaptive arithmetic coder in high performance wavelet image coders.
We address efficient context modeling in arithmetic coding for wavelet image compression. Quantized
highpass wavelet coefficients are first mapped into a binary source, followed by high order context
modeling in arithmetic coding. A blending technique is used to combine results of context modeling
of different orders into a single probability estimate. Experiments show that an arithmetic coder with
efficient context modeling is capable of achieving a 10% bitrate saving (or 0.5 dB gain in PSNR) over
a zeroth order adaptive arithmetic coder in high performance wavelet image coders.
We introduce a novel region-based video compression framework based on morphology to efficiently capture motion correspondences between consecutive frames in an image sequence. Our coder is built on the observation that the motion field associated with typical image sequences can be segmented into component motion subfield 'clusters' associated with distinct objects or regions in the scene, and further that these clusters can be efficiently captured using morphological operators in a 'backward' framework that avoids the need to send region shape information. Regions segmentation is performed directly on the motion field by introducing a small 'core' for each cluster that captures the essential features of the cluster and reliably represents its motion behavior. Cluster matching is used in lieu of the conventional block matching methods of standard video codecs to define a cluster motion representation paradigm. Furthermore, a region-based pel-recursive approach is applied to find the refinement motion field for each cluster and the cluster motion prediction error image is coded by a novel adaptive scalar quantization method. Experimental results reveal a 10-20 percent reduction in prediction error energy and 0.5-2 dB gain in final reconstructed PSNR over the standard MPEG-12 coder at typical bitrates of 500 kbits/sec to 1 Mb/sec on standard test sequences.
We introduce a new image compression framework that combines compression efficiency with speed, and is based on an independent infinite mixture model which accurately captures the space-frequency characterization of the wavelet image representation. Specifically, we model individual image wavelet coefficients as being drawn from an independent generalized Gaussian distribution (GGD) field of zero mean and unknown spatially-varying variances. Based on this model, we develop a powerful estimation-quantization (EQ) framework that consists of: (i) first finding the maximum- likelihood estimate of the individual spatially-varying coefficient field variances based on causal and quantized spatial neighborhood contexts; and (ii) then applying an off-line rate-distortion (R-D) optimized quantization/entropy coding strategy, implemented as a fast lookup table, that is optimally matched to the derived variance estimates. A distinctive feature of our framework is the dynamic partitioning of wavelet data into subsets representing coefficients that are 'predictable' and 'unpredictable' respectively from their quantized causal contexts. The statistical parameters of the 'unpredictable' set in each subband, obtained through a fast, R-D based, simple thresholding first-pass operation, represent the negligible parametric side-information for use in the forward adaptation mode. The combination of the powerful infinite mixture model, the dynamic switching between forward and backward adaptation modes, and the theoretical soundness and speed of the EQ framework lead to a novel, high-performing, and fast image coder that is extremely competitive with the best published coders in the literature across all classes of images and target bit rates of interest, in both compression efficiency and processing speed. For example, our coder exceeds the objective performance of the best zerotree-based wavelet coder at all bit rates for all tested images at a fraction of its complexity. At mow to medium bit rates our preliminary results appear to exceed all reported results in the wavelet image coding literature to the best of our knowledge.
We introduce a novel image-adaptive encoding scheme for the baseline JPEG standard that maximizes the decoded image quality without compromising compatibility with current JPEG decoders. Our algorithm jointly optimizes quantizer selection, coefficient 'thresholding', and entropy coding within a rate-distortion (R-D) framework. It unifies two previous approaches to image-adaptive JPEG encoding: R-D optimized quantizer selection by Wu and Gersho, and R-D optimal coefficient thresholding by Ramchandran and Vetterli. By formulating an algorithm which optimizes these two operations jointly, we have obtained performance that is the best in the reported literature for JPEG-compatible coding. In fact the performance of this JPEG coder is comparable to that of more complex 'state-of-the-art' image coding schemes: e.g., for the benchmark 512 by 512 'Lenna' image at a coding rate of 1 bit per pixel, our algorithm achieves a peak signal to noise ratio of 39.6 dB, which represents a gain of 1.7 dB over JPEG using the example Q- matrix with a customized Huffman entropy coder, and even slightly exceeds the published performance of Shapiro's celebrated embedded zerotree wavelet coding scheme. Furthermore, with the choice of appropriate visually-based error metrics, noticeable subjective improvement has been achieved as well. The reason for our algorithm's superior performance can be attributed to its conceptual equivalence to the application of entropy-constrained vector quantization design principles to a JPEG-compatible framework. Furthermore, our algorithm may be applied to other systems that use run-length encoding, including intra- frame MPEG and subband or wavelet coding.
Recently, a new class of image coding algorithms coupling standard scalar quantization of frequency coefficients with tree-structured quantization has attracted wide attention because its good performance appears to confirm the promised efficiencies of hierarchical representation. This paper addresses the problem of how spatial quantization modes and standard scalar quantization can be applied in a jointly optimal fashion in an image coder. We consider zerotree quantization and the simplest form of scalar quantizations, and we formalize the problem of optimizing their joint application and we develop an image coding algorithm for solving the resulting optimization problem. Despite the basic form of the two quantizers considered, the resulting algorithm demonstrates coding performance that is competitive the very best coding algorithms in the literature.
A novel wavelet packet image coder is introduced in this paper. It is based on our previous work on wavelet image coding using space-frequency quantization (SFQ), in which zerotree quantization and scalar quantization are jointly optimized in a rate-distortion sense. In this paper, we extend the powerful SFQ coding paradigm from the wavelet transform to the more general wavelet packet transformation. The resulting wavelet packet coder offers a universal transform coding framework within the constraints of filter bank structures by allowing joint transform and quantizer design without assuming a priori statistics of the input image. In other words, the new coder adaptively chooses the representation to suit the image and the quantization to suit the representation. Experimental results show that, for some image classes, our new coder is capable of achieving the best coding performances among those in the published literature.
In standard video coding algorithms image frames are represented partly through motion information and partly through intensity information. Through motion information, a new frame is described with respect to the previous frame in the video sequence, and through intensity information it is described without reference to the previous frame. As long as consecutive frames are related through motion, motion-based information generally yields more efficient representation than intensity-based information. This observation has motivated the development of a new coding scheme in which image sequences are represented solely in terms of their motion fields. In the new coding scheme every pixel in a new image frame is assigned a motion vector that fully conveys the new pixel intensity. The resulting full resolution motion field fully characterizes the new frame. The motion field is then coded and sent to the decoder. Such a motion based representation has been shown to be more efficient than standard hybrid representations. In this paper the motion-based coding scheme is generalized to allow variable block sizes. Image frames are represented in terms of quadtrees that describe their division into blocks of varying sizes. Simulations show that variable block sizes allow for more efficient motion-based coding of video sequences than fixed block sizes do.
We address the problem of optimal forward-adaptive quantization in the video and image coding framework. In this framework, as is consistent with that of most practical coders like MPEG, the encoder has the capability of changing the quantizer periodically (e.g. at a macroblock interval in MPEG). In this paper, we formulate an optimal strategy, based on dynamic programming, for updating the quantizer choice for coding an image or video signal. While in some coding environments the overhead needed to specify the quantizer used by each block is equal for every choice of quantizer, in other situations (e.g. MPEG) the overhead cost is higher if the quantizer changes from one block to the next. We concentrate on the latter case which will be more likely encountered in situations where the overhead represents a significant fraction of the overall rate, as can be the case if a low bit rate is used (e.g. error frames in a typical motion-compensated video coder). We provide empirical evidence of the performance gain that can be obtained when applying our optimal algorithm to typical motion-compensated prediction error frames in MPEG, showing how the popular Viterbi algorithm can be used to find the optimal solution.
Wavelet image decompositions generate a tree-structured set of coefficients, providing an hierarchical data-structure for representing images. Recent efforts in image compression have focused on new ways for exploiting dependencies within this hierarchy of wavelet coefficients, and these efforts are achieving significant improvements in coding performance over traditional subband coding approaches. This paper presents a statistical framework for understanding the efficiency of these new zerotree wavelet coding algorithms, and describes our own wavelet coding algorithm which optimally couples scalar and zerotree quantization modes. Building from this framework, we describe a radically new approach to image compression (dubbed morphological representation of wavelet data or MRWD) based on the statistical characterization of morphological sets of wavelet coefficients (a morphological set is a set of coefficients related to other coefficients via morphological operators). Preliminary simulation of the new algorithm shows very promising results.
We examine the question of how to choose a time-varying filter bank representation for a signal which is optimal with respect to an additive cost function. The tree structure gives segmentations of the signal in frequency, while the time-varying nature of the tree gives segmentations in time. We present an efficient algorithm which finds the optimal basis, given the constraint that the time and frequency segmentations are binary. Extension to multiple dimensions is simple. We verify that the algorithm indeed produces a lower cost representation than any of the wavelet packet representations for compression of images using a simple rate-distortion cost.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.