Paper
14 April 2023 A winter sports video classification method and system based on 3D-video Swin transformer
Xiaolin Yuan
Author Affiliations +
Proceedings Volume 12634, International Conference on Optics and Machine Vision (ICOMV 2023); 126340R (2023) https://doi.org/10.1117/12.2678658
Event: International Conference on Optics and Machine Vision (ICOMV 2023), 2023, Changsha, China
Abstract
Videos of winter sports can be a very good resource for learning about sports in China. In this paper, we address the issues of a small winter sports video dataset and low classification accuracy by proposing a fast video classification approach that combines a transformer and a convolutional neural network. The 3D-Video Swin Transformer model is built using the resnet3D feature extraction component and the Video Swin Transformer, and it improves local and global modeling capabilities through a multi-headed self-attention mechanism. Convolutional operations are used at the network front-end to make for the Transformer's lack of inductive bias, enhancing the network's local modeling capabilities. Convolution operations are utilized at the network's front end to make for Transformer's lack of inductive bias, hence increasing the network's capacity for local modeling and decreasing the model's dependency on vast amounts of data. The experimental results indicate that the 3D-Video Swin Transformer model may achieve an accuracy of up to 76.43 percent on the winter sports video dataset developed in this paper. The classification impact is also substantially stronger, and this accuracy is also 1.15 percent higher than that of the Video Swin Transformer Network. Additionally, we develop and implement a winter sports video classification system based on the Milvus database to facilitate user interaction and enable the submission, categorization, and recommendation of winter sports movies.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xiaolin Yuan "A winter sports video classification method and system based on 3D-video Swin transformer", Proc. SPIE 12634, International Conference on Optics and Machine Vision (ICOMV 2023), 126340R (14 April 2023); https://doi.org/10.1117/12.2678658
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Transformers

3D modeling

Data modeling

Convolutional neural networks

Education and training

Feature extraction

Back to Top