论文标题
3D对象检测的多视图自适应融合网络
Multi-View Adaptive Fusion Network for 3D Object Detection
论文作者
论文摘要
基于激光摄像机融合的3D对象检测正在成为自动驾驶的新兴研究主题。但是,在没有信息丢失和干扰的情况下有效融合这两种方式非常困难。为了解决此问题,我们提出了一个单阶段的多视图融合框架,该框架将LIDAR BIRD的视图,LiDAR范围视图和相机视图图像作为3D对象检测的输入。为了有效地融合多视图功能,我们提出了一个细心的点融合(APF)模块,以估算三个来源的重要性,其注意机制可以以刻薄的方式实现多视图特征的适应性融合。此外,一个细心的重量加权(APW)模块旨在帮助网络学习结构信息和点特征,并使用两个额外的任务(即前景分类和中心回归),并且使用预测的前景概率来重新占据点特征。我们设计了一个名为MVAF-NET的端到端可学习网络,以集成这两个组件。我们在Kitti 3D对象检测数据集上进行的评估表明,所提出的APF和APW模块可提供显着的性能增长。此外,拟议的MVAF-NET在所有单阶段融合方法中都取得了最佳性能,并且胜过大多数两阶段融合方法,在Kitti基准测试中实现了速度和准确性之间的最佳折衷。
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving. However, it has been surprisingly difficult to effectively fuse both modalities without information loss and interference. To solve this issue, we propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection. To effectively fuse multi-view features, we propose an attentive pointwise fusion (APF) module to estimate the importance of the three sources with attention mechanisms that can achieve adaptive fusion of multi-view features in a pointwise manner. Furthermore, an attentive pointwise weighting (APW) module is designed to help the network learn structure information and point feature importance with two extra tasks, namely, foreground classification and center regression, and the predicted foreground probability is used to reweight the point features. We design an end-to-end learnable network named MVAF-Net to integrate these two components. Our evaluations conducted on the KITTI 3D object detection datasets demonstrate that the proposed APF and APW modules offer significant performance gains. Moreover, the proposed MVAF-Net achieves the best performance among all single-stage fusion methods and outperforms most two-stage fusion methods, achieving the best trade-off between speed and accuracy on the KITTI benchmark.