MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask

Jiaqi Zhao; Chaoyue Zhao; Chunling Liu; Chaojian Zhang; Wang Zhang

doi:10.1117/1.JEI.31.5.053013

14 September 2022 MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask

Jiaqi Zhao, Chaoyue Zhao, Chunling Liu, Chaojian Zhang, Wang Zhang

Author Affiliations +

Journal of Electronic Imaging, Vol. 31, Issue 5, 053013 (September 2022). https://doi.org/10.1117/1.JEI.31.5.053013

Abstract

Solving the problem that features prediction of self-supervised monocular depth estimation is still ambiguous in low-texture regions and boundaries. We proposed an innovative self-supervised monocular depth estimation method, monocular depth self-supervised network, which integrates three effective strategies to construct an innovative self-supervised monocular depth estimation framework: (1) the attention mechanism and feature fusion module are adopted to enhance the semantic and spatial information of feature images, (2) the threshold segmentation mask is utilized to solve object motion and low-texture regions to increase image details, and (3) the residual pose module and deep reconstruction loss are used to enhance the feature extraction capability of the model to improve the accuracy of depth and pose estimation. Comprehensive experiments and visual analysis results demonstrate the effectiveness of each component in isolation. Compared to existing self-supervised methods, our model not only achieves outstanding results on KITTI and NYU Depth V2 datasets but also can be suitable to different environments.

Citation Download Citation

Jiaqi Zhao, Chaoyue Zhao, Chunling Liu, Chaojian Zhang, and Wang Zhang "MDSNet: self-supervised monocular depth estimation for video sequences using self-attention and threshold mask," Journal of Electronic Imaging 31(5), 053013 (14 September 2022). https://doi.org/10.1117/1.JEI.31.5.053013

Received: 15 March 2022; Accepted: 30 August 2022; Published: 14 September 2022

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $24.00

Non-members: $28.00 ADD TO CART

JOURNAL ARTICLE
18 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Video

Cameras

Image segmentation

Lawrencium

Convolution

Image fusion

Performance modeling

Show All Keywords

Keywords/Phrases

Search In:

Publication Years