Visual object tracking has attracted a lot of interests due to its applications in numerous fields such as industry and security. Because the change of illumination could lead to RGB tracking failure, more and more researchers focus on RGB-T tracking methods based on fusion of visible and thermal infrared spectrums and hasten their development in recent years. In order to utilize dual-modal complementary information adaptively, we design a weight-aware dual-modal feature aggregation mechanism, and the WF DiMP algorithm for RGB-T tracking is therefore proposed in this paper. In WF DiMP, deep features of visible and thermal infrared images are extracted by ResNet50 and are leveraged to produce heterogenous response maps, from which dual-modal weights are learned adaptively. Weighted deep features are then concatenated as input of classifier and bounding box estimation module respectively in DiMP (Discriminative Model Prediction) network to obtain the final confidence map and an object bounding box. Experiments on VOT-RGBT2019 dataset are carried out. The results show that WF DiMP algorithm has higher tracking accuracy and robustness. The evaluation indexes PR, SR reach 82.1% and 56.3% respectively, which prove the effectiveness of our mechanism given in the paper.
|