14 September 2022 MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask
Jiaqi Zhao, Chaoyue Zhao, Chunling Liu, Chaojian Zhang, Wang Zhang
Author Affiliations +
Abstract

Solving the problem that features prediction of self-supervised monocular depth estimation is still ambiguous in low-texture regions and boundaries. We proposed an innovative self-supervised monocular depth estimation method, monocular depth self-supervised network, which integrates three effective strategies to construct an innovative self-supervised monocular depth estimation framework: (1) the attention mechanism and feature fusion module are adopted to enhance the semantic and spatial information of feature images, (2) the threshold segmentation mask is utilized to solve object motion and low-texture regions to increase image details, and (3) the residual pose module and deep reconstruction loss are used to enhance the feature extraction capability of the model to improve the accuracy of depth and pose estimation. Comprehensive experiments and visual analysis results demonstrate the effectiveness of each component in isolation. Compared to existing self-supervised methods, our model not only achieves outstanding results on KITTI and NYU Depth V2 datasets but also can be suitable to different environments.

© 2022 SPIE and IS&T
Jiaqi Zhao, Chaoyue Zhao, Chunling Liu, Chaojian Zhang, and Wang Zhang "MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask," Journal of Electronic Imaging 31(5), 053013 (14 September 2022). https://doi.org/10.1117/1.JEI.31.5.053013
Received: 15 March 2022; Accepted: 30 August 2022; Published: 14 September 2022
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Cameras

Image segmentation

Lawrencium

Convolution

Image fusion

Performance modeling

RELATED CONTENT


Back to Top