Most existing multi-object tracking models only focus on the spatial information generated by image-level input while ignoring the necessary temporal information. The temporal motion information between consecutive frames can effectively reflect the target’s motion status, which is essential to improving the performance in dealing with occlusion and motion blur of the model. We propose MISTracker to realize the motion information supplement to the original tracking model. Specifically, we divide multi-scale feature maps into two categories from the perspective of space and channel information. Meanwhile, the spatial-level frame differences processing (SFDP) module and the channel-level frame differences processing (CFDP) module are proposed to deal with these differences between continuous frames, respectively. The SFDP processes the differences from the perspective of spatial information and supplements the motion information through the perception of pixel-level information changes in the feature maps. The CFDP processes the differences from the perspective of channels and enhances the information of motion-sensitive channels through the overall pixel differences of different channels. Eventually, temporal and motion information are complementary to each other after upsampling fusion. The whole process is realized by simple convolution, which reduces the computational force as much as possible and enhances the tracking performance of the model. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Motion models
Education and training
Target detection
Data modeling
Performance modeling
Feature extraction
Convolution