Poster + Paper
22 November 2024 Deformable attention-guided network for multitask learning
Author Affiliations +
Conference Poster
Abstract
Depth estimation and semantic segmentation are crucial for visual perception and scene understanding. Multi-task learning, which captures shared features across multiple tasks within a scene, is often applied to depth estimation and semantic segmentation tasks to jointly improve accuracy. In this paper, a deformable attention-guided network for multi-task learning is proposed to enhance the accuracy of both depth estimation and semantic segmentation. The primary network architecture consists of a shared encoder, initial pred modules, deformable attention modules and decoders. RGB images are first input into the shared encoder to extract generic representations for different tasks. These shared feature maps are then decoupled into depth, semantic, edge and surface normal features in the initial pred module. At each stage, effective attention is applied to depth and semantic features under the guidance of fusion features in the deformable attention module. The decoder upsamples each deformable attention-enhanced feature map and outputs the final predictions. The proposed model achieves mIoU accuracy of 44.25% and RMSE of 0.5183, outperforming the single task baseline, multi-task baseline and state-of-the-art multi-task learning model.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Cong Liu, Xinghui Li, Ruipeng Ling, Yibo Xing, Boliang Li, Chaobo Zhang, and Xiaojun Liang "Deformable attention-guided network for multitask learning", Proc. SPIE 13239, Optoelectronic Imaging and Multimedia Technology XI, 132390Z (22 November 2024); https://doi.org/10.1117/12.3036091
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Semantics

Deformation

RGB color model

Feature fusion

Feature extraction

Image segmentation

Convolution

Back to Top