We present a framework for analyzing the structure of digital media streams. Though our methods work for video, text, and audio, we concentrate on detecting the structure of digital music files. In the first step, spectral data is used to construct a similarity matrix calculated from inter-frame spectral similarity.The digital audio can be robustly segmented by correlating a kernel along the diagonal of the similarity matrix. Once segmented, spectral statistics of each segment are computed. In the second step,segments are clustered based on the self-similarity of their statistics. This reveals the structure of the digital music in a set of segment boundaries and labels. Finally, the music is summarized by selecting clusters with repeated segments throughout the piece. The summaries can be customized for various applications based on the structure of the original music.
We present a framework, motivated by rate-distortion theory and the human visual system, for optimally representing the real world given limited video resolution. To provide users with high fidelity views, we built a hybrid video camera system that combines a fixed wide-field panoramic camera with a controllable pan/tilt/zoom (PTZ) camera. In our framework, a video frame is viewed as a limited-frequency representation of some "true" image function. Our system combines outputs from both cameras to construct the highest fidelity views possible, and controls the PTZ camera to maximize information gain available from higher spatial frequencies. In operation, each remote viewer is presented with a small panoramic view of the entire scene, and a larger close-up view of a selected region. Users may select a region by marking the panoramic view. The system operates the PTZ camera to best satisfy requests from multiple users. When no regions are selected, the system automatically operates the PTZ camera to minimize predicted video distortion. High-resolution images are cached and sent if a previously recorded region has not changed and the PTZ camera is pointed elsewhere. We present experiments demonstrating that the panoramic image can effectively predict where to gain the most information, and also that the system provides better images to multiple users than conventional camera systems.
KEYWORDS: Distance measurement, Quantization, Image retrieval, Brain-machine interfaces, Multimedia, Acoustics, Data modeling, Speech recognition, Databases, Microsoft Foundation Class Library
Though many systems exist for content-based retrieval of images, little work has been done on the audio portion of the multimedia stream. This paper presents a system to retrieve audio documents y acoustic similarity. The similarity measure is based on statistics derived from a supervised vector quantizer, rather than matching simple pitch or spectral characteristics. The system is thus able to learn distinguishing audio features while ignoring unimportant variation. Both theoretical and experimental results are presented, including quantitative measures of retrieval performance. Retrieval was tested on a corpus of simple sounds as well as a corpus of musical excerpts. The system is purely data-driven and does not depend on particular audio characteristics. Given a suitable parameterization, this method may thus be applicable to image retrieval as well.
Conference Committee Involvement (2)
Storage and Retrieval Methods and Applications for Multimedia 2005
18 January 2005 | San Jose, California, United States
Storage and Retrieval Methods and Applications for Multimedia 2004
20 January 2004 | San Jose, California, United States
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.