Light detection and ranging (LiDAR) camera systems are becoming increasingly vital for autonomous driving. The monocular three-dimensional (3D) detection task is a critical and challenging aspect of this field. However, most algorithms rely solely on manually labeled images, which is a time-consuming and labor-intensive process, and the resulting detection lacks depth information. To address this problem, a semi-supervised 3D object detection model based on LiDAR camera systems (PVONet) is proposed to improve both the detection accuracy and processing time. First, an innovative data preparation block point-voxel fusion estimation is introduced; it utilizes LiDAR points to generate 3D bounding boxes for unlabeled data, thereby significantly reducing the time compared with manual labeling. Second, a new block based on fully connected neural network for box estimation (feature extraction and 3D object detection) is presented; it is utilized to conduct feature extraction, feature correlation, and 3D box estimation on monocular images. Finally, comprehensive experiments conducted on the popular KITTI 3D detection dataset demonstrate that our PVONet is faster (30 ms on KITTI benchmark) and more accurate [with increases of 4.69%/3.82% (easy), 4.45%/2.79% (moderate), and 4.07%/3.75% (hard) aggregation processes on 3D/bird’s eye view objects compared with the baseline]. This meets the requirements for high real-time performance in autonomous vehicles applications. The results demonstrate the effectiveness of our model based on LiDAR camera systems. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Object detection
3D modeling
LIDAR
Feature extraction
3D image processing
Voxels
Cameras