Compared with other target detection tasks, infrared small target detection has the problem of feature information loss in deep networks due to fewer target pixels and the lack of color and texture features. To address aforementioned issue, a Multi-Scale Feature Fusion Attention Network (MSFFA) is proposed to better utilize shallow edge features and deep semantic features. Its main components contain Convolutional Block Attention Module (CBAM), Multi-Scale Receptive Field Feature Fusion Module (R3FM), and Bidirectional Feature Aggregation Network (BFANet). CBAM is designed to calculate the importance of each feature map and enhance useful features from the channel and spatial dimensions. R3FM is proposed to characterize the global context information of deep layers feature map to enlarge the network's receptive field for small targets detection with a larger range of location information. BFANet is developed to shorten the path of information exchange between different layers and reinforce the utilization of shallow features in the network. Moreover, the K-means clustering algorithm is adopted to optimize the width to height ratio of the bounding anchor, and it can better match the positive samples to improve the training performance. Extensive experiments on public infrared small target detection dataset demonstrate that the proposed method achieves better performance compared to the other state-of-the-art methods.
Infrared small target detection (IRSTD) plays an essential role in many fields such as air guidance, tracking, and surveillance. However, due to the tiny sizes of infrared small targets, which are easily confused with background noises and lack clear contours and texture information, how to learn more discriminative small target features while suppressing background noises is still a challenging task. In this paper, a context-aware cross-level attention fusion network for IRSTD is proposed. Specifically, a self-attention-induced global context-aware module obtains multilevel attention feature maps with robust positional relationship modeling. The high-level feature maps with abundant semantic information are then passed through a multiscale feature refinement module to restore the target details and highlight salient features. Feature maps at all levels are fed into a channel and spatial filtering module to compress redundant information and remove background noises, which are then used for cross-level feature fusion. Furthermore, to overcome the lack of publicly available datasets, a large-scale multiscene infrared small target dataset with high-quality annotations is constructed. Finally, extensive experiments on both public and our self-developed datasets demonstrate the effectiveness of the proposed method and the superiority compared with other state-of-the-art approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.