18 May 2023 FPDT: a multi-scale feature pyramidal object detection transformer
Kailai Huang, Mi Wen, Chen Wang, Lina Ling
Author Affiliations +
Abstract

Object detection is a fundamental part of autonomous driving algorithms, and with the promotions of transformers in a couple of years, numerous computer vision tasks are integrating transformers into object detectors to acquire a better generalization ability. Building a pure transformer-based detector seems to be a wonderful choice; however, transformers are not omnipotent, and they come with painful drawbacks. Its fundamental operator, multi-head self-attention (MHSA), suffers from the need for computational resources due to its quadratic complexity, which demands an unreasonably high memory usage and critically low throughput. To address this issue, we use a convolution operation to simulate MHSA from transformers by referencing the philosophy and principle of MHSA and making an application migration on convolutional neural networks (CNNs). This gives a detector with power and speed simultaneously. Furthermore, a multi-scale pyramidal feature extractor gives the detector a better view over various scales. In general, our proposed object detector mainly follows the philosophy of attention mechanism, which is implemented by a multi-scale feature pyramidal CNN encoder that simulates the transformer, and a real transformer query neck to extract all of the objects once and, eventually, feed them to the output heads. After training on the COCO2017 dataset, by combining the construction philosophy of the object detector and the philosophy and characteristics of the transformer, our FPDT-Tiny gives an average precision (AP) of up to 34.1 in 150 lower epochs, which is 16.0 and 10.8 higher than CNN-based YOLOv3-Base and SSD-300, respectively. Also, the AP given by our FPDT-Small is up to 37.7 under the same epoch, which is 10.4 and 7.9 higher than the transformer-based detector YOLOS-Small and DETR-ResNet-152, respectively, also demonstrating a comparable performance.

© 2023 Society of Photo-Optical Instrumentation Engineers (SPIE)
Kailai Huang, Mi Wen, Chen Wang, and Lina Ling "FPDT: a multi-scale feature pyramidal object detection transformer," Journal of Applied Remote Sensing 17(2), 026510 (18 May 2023). https://doi.org/10.1117/1.JRS.17.026510
Received: 25 December 2022; Accepted: 28 April 2023; Published: 18 May 2023
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Object detection

Education and training

Head

Neck

Feature extraction

Convolution

Back to Top