Paper
22 May 2024 Rethinking the expressiveness of small objects in UAV scenes via adaptive multi-scale and context-aware feature fusion
Xinghua Wang, Weilin Liu, Lunqian Wang, Zekai Zhang, Jinglin Zhang, Xing Wang
Author Affiliations +
Proceedings Volume 13176, Fourth International Conference on Machine Learning and Computer Application (ICMLCA 2023); 131763G (2024) https://doi.org/10.1117/12.3028972
Event: Fourth International Conference on Machine Learning and Computer Application (ICMLCA 2023), 2023, Hangzhou, China
Abstract
Compared to conventional observations, Unmanned Aerial Vehicle (UAV) observations often result in a large number of small objects that are fuzzy and lack sufficient feature detail. In this paper, a Dynamic Meta Tandem Transformer (DMTT) network is proposed to solve the small objects problem under the view of the UAV field. We design an Adaptive Cross-dimensional Attention (ACA) to dynamically focus on important feature information in multiple dimensions. a Tandem Pooling Layer (TPL) to reduce the number of parameters and maintain high accuracy. Based on ACA and TPL, Attention Meta Feature Fusion (AMFF) and Adaptive Tandem Transformer (ATT) are proposed. AMMF can capture fine-grained spatial information to address the lack of information on small objects. ATT can aggregate global contextual information to enhance the position of small objects. Extensive experiments on the Visdrone dataset, DMTT improves by 1.3% on the AP metrics compared to the state-of-the-art model. On the UAVDT dataset, DMTT improves by 2.3% on the AP metrics compared to the state-of-the-art model. Meanwhile, the FPS metrics of DMTT on Visdrone and UAVDT reach 30.1 and 30.5, respectively. The results show that DMTT has both accuracy and speed. Moreover, DMTT improves by 0.3% (AP 46.9%) on the MS-COCO dataset, which is encouraging and competitive.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Xinghua Wang, Weilin Liu, Lunqian Wang, Zekai Zhang, Jinglin Zhang, and Xing Wang "Rethinking the expressiveness of small objects in UAV scenes via adaptive multi-scale and context-aware feature fusion", Proc. SPIE 13176, Fourth International Conference on Machine Learning and Computer Application (ICMLCA 2023), 131763G (22 May 2024); https://doi.org/10.1117/12.3028972
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Unmanned aerial vehicles

Object detection

Transformers

Feature fusion

Convolution

Head

Feature extraction

RELATED CONTENT


Back to Top