FFPA-NET：有效的特征融合与3D对象检测的投影意识

论文标题

FFPA-NET：有效的特征融合与3D对象检测的投影意识

FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection

论文作者

Jiang, Chaokang, Wang, Guangming, Wu, Jinxing, Miao, Yanzi, Wang, Hesheng

论文摘要

有希望的互补性存在着颜色图像的纹理特征与激光点云的几何信息。但是，在3D对象检测领域中，仍然存在许多挑战，以实现高效且可靠的特征融合。在本文中，第一个，非结构化的3D点云在2D平面中填充，使用投影感知的卷积层更快地提取了3D点云特征。此外，在数据预处理中提前建立了不同传感器信号之间的相应索引，从而实现更快的交叉模式融合。为了解决LIDAR点和图像像素的未对准问题，提出了两个新的插件融合模块，即licamfuse和bilicamfuse。在Licamfuse中，提出了带有双峰特征的欧几里得距离的软查询权重。在Bilicamfuse中，提出了双重注意的融合模块，以深层关联场景的几何和纹理特征。 KITTI数据集上的定量结果表明，所提出的方法可以实现更好的特征级融合。此外，与现有方法相比，提出的网络显示出较短的运行时间。

Promising complementarity exists between the texture features of color images and the geometric information of LiDAR point clouds. However, there still present many challenges for efficient and robust feature fusion in the field of 3D object detection. In this paper, first, unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers. Further, the corresponding indexes between different sensor signals are established in advance in the data preprocessing, which enables faster cross-modal feature fusion. To address LiDAR points and image pixels misalignment problems, two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed. In LiCamFuse, soft query weights with perceiving the Euclidean distance of bimodal features are proposed. In BiLiCamFuse, the fusion module with dual attention is proposed to deeply correlate the geometric and textural features of the scene. The quantitative results on the KITTI dataset demonstrate that the proposed method achieves better feature-level fusion. In addition, the proposed network shows a shorter running time compared to existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题