Presentation + Paper
29 March 2024 Self-supervised monocular depth and ego-motion estimation for CT-bronchoscopy fusion
Author Affiliations +
Abstract
The management of lung cancer necessitates robust diagnostic tools, with three-dimensional (3D) computed tomography (CT) imaging and bronchoscopy standing as pivotal complementary resources. Bronchoscopy captures live endobronchial video, providing striking detail of the airway tree’s interior, while 3D CT scans contribute extensive anatomical knowledge. A significant gap persists, however, in linking these data-rich sources, such as in the fusion of video data from bronchoscopic airway exams and airway surface data from 3D CT scans. The main issue is the difficulty in simultaneously acquiring depth and camera pose information for bronchoscopic video frames. A solution to this problem can facilitate CT-video fusion/rendering, multimodal registration, and 3D cancer lesion localization. Deep-learning networks have been recently employed to estimate the depth and ego-motion information. Unfortunately, it is challenging to acquire the required training data, consisting of ground-truth pairs of bronchoscopic video frames and corresponding depth maps. Along this line, generative adversarial networks (GANs) have shown promise in domain transformation from CT-based endoluminal surface views into synthesized bronchoscopic frames. These synthesized views are consequently aligned with their CT-derived depth map, generating valuable training data. Nonetheless, such domain transformation techniques fail to utilize frame sequence knowledge and supply no information about the camera’s ego-motion. Parallel studies in other domains, such as endoscopy, have emphasized the photometric consistency between adjacent frames to jointly offer depth and ego-motion estimation. Nevertheless, the texture-less and smooth endoluminal surface inside the airway restricts the generation of distinct depth maps with enhanced clarity and detail. To address this problem, we present a self-supervised training strategy that incorporates both domain transformation and photometric consistency for the Monodepth2 deep learning architecture, improving the depth and ego-motion prediction of bronchoscopic video frames. Results drawing on well-registered test data illustrate that the proposed strategy achieves clear and precise prediction. In addition, effective reference scaling factors are summarized from the test dataset, enabling real-world applications, such as 3D surface reconstruction, camera trajectory generation, and fusion between CT and bronchoscopic video.
Conference Presentation
© (2024) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Qi Chang and William E. Higgins "Self-supervised monocular depth and ego-motion estimation for CT-bronchoscopy fusion", Proc. SPIE 12928, Medical Imaging 2024: Image-Guided Procedures, Robotic Interventions, and Modeling, 129280B (29 March 2024); https://doi.org/10.1117/12.3004499
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Depth maps

Cameras

Video

3D modeling

Computed tomography

Bronchoscopy

Back to Top