Paper
13 April 2018 From image captioning to video summary using deep recurrent networks and unsupervised segmentation
Bogdan-Andrei Morosanu, Camelia Lemnaru
Author Affiliations +
Proceedings Volume 10696, Tenth International Conference on Machine Vision (ICMV 2017); 106960P (2018) https://doi.org/10.1117/12.2310071
Event: Tenth International Conference on Machine Vision, 2017, Vienna, Austria
Abstract
Automatic captioning systems based on recurrent neural networks have been tremendously successful at providing realistic natural language captions for complex and varied image data. We explore methods for adapting existing models trained on large image caption data sets to a similar problem, that of summarising videos using natural language descriptions and frame selection. These architectures create internal high level representations of the input image that can be used to define probability distributions and distance metrics on these distributions. Specifically, we interpret each hidden unit inside a layer of the caption model as representing the un-normalised log probability of some unknown image feature of interest for the caption generation process. We can then apply well understood statistical divergence measures to express the difference between images and create an unsupervised segmentation of video frames, classifying consecutive images of low divergence as belonging to the same context, and those of high divergence as belonging to different contexts. To provide a final summary of the video, we provide a group of selected frames and a text description accompanying them, allowing a user to perform a quick exploration of large unlabeled video databases.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Bogdan-Andrei Morosanu and Camelia Lemnaru "From image captioning to video summary using deep recurrent networks and unsupervised segmentation", Proc. SPIE 10696, Tenth International Conference on Machine Vision (ICMV 2017), 106960P (13 April 2018); https://doi.org/10.1117/12.2310071
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Video

Data modeling

Statistical modeling

Image classification

Image processing algorithms and systems

Image processing

Back to Top