We give an overview of existing audio analysis approaches in the compressed domain and incorporate them into a coherent formal structure. After examining the kinds of information accessible in an MPEG-1 compressed audio stream, we describe a coherent approach to determine features from them and report on a number of applications they enable. Most of them aim at creating an index to the audio stream by segmenting the stream into temporally coherent regions, which may be classified into pre-specified types of sounds such as music, speech, speakers, animal sounds, sound effects, or silence. Other applications centre around sound recognition such as gender, beat or speech recognition.
The importance of perceptive modeling for calculation of sound features is well known. Use of simple perception-based adaptations of physically measured stimuli, such as the dB- scale or loudness, is a minimal requirement. Exactly how much value can be gained by more complex perceptive modeling, has not been investigated in detail. The paper examines this question for loudness measures, using well- known psychoacoustic knowledge for their calculation. Profiles of these measures are calculated on audio data of movie material, deliberately using 'natural' sound, instead of reverting to artificial sounds in the laboratory. Ultimately, the quality of a sound feature can only be judged by comparison to human estimates. Therefore, test people were asked to express their perception of loudness by continuous classification into five classes (called pp, p, mf, f, and ff). The results were used to evaluate two loudness measures: the sound pressure level, and an integral loudness measure, developed in the discussed research. The correlation of the human loudness estimates to the integral loudness measure, is about 10 percent higher than to the sound pressure level. In addition, the integral loudness results in a significantly better approximation of the curve of human loudness estimates.
Conference Committee Involvement (5)
Multimedia Content Access: Algorithms and Systems
31 January 2007 | San Jose, CA, United States
Multimedia Content Analysis, Management, and Retrieval 2006
18 January 2006 | San Jose, California, United States
Storage and Retrieval Methods and Applications for Multimedia 2005
18 January 2005 | San Jose, California, United States
Storage and Retrieval Methods and Applications for Multimedia 2004
20 January 2004 | San Jose, California, United States
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.