Nowadays, a strong need exists for the efficient organization of an increasing amount of home video content. To create
an efficient system for the management of home video content, it is required to categorize home video content in a
semantic way. So far, a significant amount of research has already been dedicated to semantic video categorization.
However, conventional categorization approaches often rely on unnecessary concepts and complicated algorithms that
are not suited in the context of home video categorization. To overcome the aforementioned problem, this paper
proposes a novel home video categorization method that adopts semantic home photo categorization. To use home photo
categorization in the context of home video, we segment video content into shots and extract key frames that represent
each shot. To extract the semantics from key frames, we divide each key frame into ten local regions and extract lowlevel
features. Based on the low level features extracted for each local region, we can predict the semantics of a
particular key frame. To verify the usefulness of the proposed home video categorization method, experiments were
performed with home video sequences, labeled by concepts part of the MPEG-7 VCE2 dataset. To verify the usefulness
of the proposed home video categorization method, experiments were performed with 70 home video sequences. For the
home video sequences used, the proposed system produced a recall of 77% and an accuracy of 78%.
As online image sharing services are becoming popular, the importance of correctly annotated tags is being emphasized
for precise search and retrieval. Tags created by user along with user-generated contents (UGC) are often ambiguous due
to the fact that some tags are highly subjective and visually unrelated to the image. They cause unwanted results to users
when image search engines rely on tags. In this paper, we propose a method of measuring tag confidence so that one can
differentiate confidence tags from noisy tags. The proposed tag confidence is measured from visual semantics of the
image. To verify the usefulness of the proposed method, experiments were performed with UGC database from social
network sites. Experimental results showed that the image retrieval performance with confidence tags was increased.
In a mobile consumption environment, users not only desire to preview video contents with highlights, but also desire to
consume attractive segments of the video rather than the whole video. Thus, condensed representation of video contents
which can represent the whole video content and video structure is demanded. In this paper, we propose a video content
authoring system allowing content authors to filter the video structure and to compose contents and metadata efficiently
and effectively. The proposed authoring system consists of two modules: video analyzer and metadata generator. A
video analyzer detects shot boundaries and scenes and establishes temporal segmentation metadata including shot and
scene boundary information. The shot detection adopts adaptive thresholding with different multiple windows to
segment the raw video into shots. The segmented shots are grouped and merged depending on similarity between
adjacent shots. In order to minimize the consumption time of the shot clustering, we apply a span as a computation unit,
which is defined as aggression of successive shots. A metadata generator allows authors to edit the video metadata in
addition to temporal segmentation metadata which is detected by a video analyzer. The video metadata supports
hierarchical representation of individual shot and scene.
KEYWORDS: Multimedia, Video, Chemical species, Computer programming, Data storage, Binary data, Digital video discs, Telecommunications, Visualization, Image storage
In this paper, we propose a storage format which binds digital broadcasts with related data such as TV-Anytime
metadata, additional multimedia resources, and personal viewing history. The goal of the
proposed format is to make it possible to offer personalized content consumption after recording
broadcasting contents to storage devices, e.g., HD-DVD and Blu-ray Disc. To achieve that, we adopt
MPEG-4 file format as a container and apply a binary format for scenes (BIFS) for representing and
rendering personal viewing history. In addition, TV-Anytime metadata is used to describe broadcasts and
to refer to the additional multimedia resources, e.g, images, audio clips, and short video clips. To
demonstrate the usefulness of the proposed format, we introduce an application scenario and test it on that
scenario.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.