Paper
2 December 2024 A video retrieval algorithm with embedded position coding for moment localization and highlight detection
Fei Song, Yude Wang, Ting Zhao, Teng Liu
Author Affiliations +
Proceedings Volume 13443, Fifth International Conference on Computer Vision and Information Technology (CVIT 2024); 1344306 (2024) https://doi.org/10.1117/12.3055603
Event: 2024 5th International Conference on Computer Vision and Information Technology (CVIT 2024), 2024, Beijing, China
Abstract
To solve the problem of incomplete acquisition of context information in mode in existing video retrieval algorithms, this paper proposes to introduce sine-cosine 2D position coding into single mode coding to capture fine-grained local details and improve the detection accuracy of moment localization and highlight detection algorithms. First, the video and text are processed to extract the visual features, audio features and text features. The visual and audio features are input into the single mode coding embedded in the 2D position of sine and cosine respectively to capture the global time relationship, and then multi-mode fusion is carried out. The fused features and text features are then used to generate always-aligned queries. Finally, the query decoding and prediction methods are used to obtain the results of moment localization and highlight detection, and the video retrieval is completed. The proposed method was experimentally verified on four datasets: QVHighlights, Charades-STA, TVSum and YouTube Highlights. On QVHighlights dataset, the mAP indexes of moment localization and highlight detection tasks reached 39.09 and 39.68, respectively. On the Charades-STA dataset, the Recall@1 index with IoU threshold of 0.5 reached 51.13. The mAP indicator on the TVSum and YouTube Highlights datasets reached 83.7 and 75.3, respectively. The research work of this paper provides theoretical support for the realization of video retrieval technology.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Fei Song, Yude Wang, Ting Zhao, and Teng Liu "A video retrieval algorithm with embedded position coding for moment localization and highlight detection", Proc. SPIE 13443, Fifth International Conference on Computer Vision and Information Technology (CVIT 2024), 1344306 (2 December 2024); https://doi.org/10.1117/12.3055603
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top