论文标题
自动驾驶中的单眼3D对象检测的伪sTEREO
Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
论文作者
论文摘要
伪LIDAR 3D检测器通过增强具有深度估计网络的感知深度的能力,并使用基于激光雷达的3D检测体系结构,从而在单眼3D检测中取得了显着进步。高级立体声3D检测器还可以准确定位3D对象。立体声视图的图像到图像生成的差距远小于图像到limar的生成。在此激励的情况下,我们提出了一个使用三种新颖的虚拟视图生成方法,包括图像级生成,特征级生成和特征 - 词,用于从单个图像中检测3D对象。我们对深度感知学习的分析表明,仅在特征级的虚拟视图生成中,深度损失是有效的,并且估计的深度图在我们的框架中在图像级和特征级别都有效。我们提出了一个从差异特征映射采样的动态内核,提出了差异的动态卷积,以从单个图像中自适应地过滤特征,以生成虚拟图像特征,从而简化了由深度估计错误引起的特征降级。直到提交(2021年11月18日),我们的伪stereo 3D检测框架在单眼3D探测器中排名第一的汽车,行人和骑自行车的人,并在Kitti-3D基准上出版物中排名第一。该代码在https://github.com/revisitq/pseudo-stereo-3d上发布。
Pseudo-LiDAR 3D detectors have made remarkable progress in monocular 3D detection by enhancing the capability of perceiving depth with depth estimation networks, and using LiDAR-based 3D detection architectures. The advanced stereo 3D detectors can also accurately localize 3D objects. The gap in image-to-image generation for stereo views is much smaller than that in image-to-LiDAR generation. Motivated by this, we propose a Pseudo-Stereo 3D detection framework with three novel virtual view generation methods, including image-level generation, feature-level generation, and feature-clone, for detecting 3D objects from a single image. Our analysis of depth-aware learning shows that the depth loss is effective in only feature-level virtual view generation and the estimated depth map is effective in both image-level and feature-level in our framework. We propose a disparity-wise dynamic convolution with dynamic kernels sampled from the disparity feature map to filter the features adaptively from a single image for generating virtual image features, which eases the feature degradation caused by the depth estimation errors. Till submission (November 18, 2021), our Pseudo-Stereo 3D detection framework ranks 1st on car, pedestrian, and cyclist among the monocular 3D detectors with publications on the KITTI-3D benchmark. The code is released at https://github.com/revisitq/Pseudo-Stereo-3D.