Thermal radiation and texture data from two different sensor types are usually combined in the fusion of infrared and visible images for generating a single image. In recent years, convolutional neural network (CNN) based on deep learning has become the mainstream technology for many infrared and visible image fusion methods, which often extracts shallow features and ignores the role of long-range dependencies in the fusion task. However, due to its local perception characteristics, CNN can only obtain global contextual information by continuously stacking convolutional layers, which leads to low network efficiency and difficulty in optimization. To address this issue, we proposed a global context fusion network (GCFN) to model context using a global attention pool, which adopts a two-stage strategy. First, a GCFN-based autoencoder network is trained for extracting multi-scale local and global contextual features. To effectively incorporate the complementary information of the input image, a dual branch fusion network combining CNN and transformer is designed in the second step. Experimental results on a publicly available dataset demonstrate that the proposed method outperforms nine advanced methods in fusion performance on both subjective and objective metrics. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Image fusion
Feature fusion
Infrared imaging
Infrared radiation
Visible radiation
Transformers
Feature extraction