Camera localization is a technique for obtaining the camera’s six degrees of freedom using the camera as a sensor input. It is widely used in augmented reality, autonomous driving, virtual reality, etc. In recent years, with the development of deep-learning technology, absolute pose regression has gained wide attention as an end-to-end learning-based localization method. The typical architecture is constructed by a convolutional backbone and a multilayer perception (MLP) regression header composed of multiple fully connected layers. Typically, the two-dimensional feature maps extracted by the convolutional backbone have to be flattened and passed into the fully connected layer for pose regression. However, this operation will result in the loss of crucial pixel position information carried by the two-dimensional feature map and adversely affect the accuracy of the pose estimation. We propose a parallel structure, termed SMLoc, using a spatial MLP to aggregate position and orientation information from feature maps, respectively, reducing the loss of pixel position information. Our approach achieves superior performance on common indoor and outdoor datasets. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Cameras
3D modeling
Sun
Education and training
Feature extraction
Head
Ablation