深层传感器融合与金字塔融合网络，用于3D语义分割

论文标题

深层传感器融合与金字塔融合网络，用于3D语义分割

Deep Sensor Fusion with Pyramid Fusion Networks for 3D Semantic Segmentation

论文作者

Schieber, Hannah, Duerr, Fabian, Schoen, Torsten, Beyerer, Jürgen

论文摘要

对自动驾驶汽车的强大环境感知是一个巨大的挑战，这使得与例如相机，激光镜头和雷达至关重要。在理解记录的传感器数据的过程中，3D语义分割起着重要作用。因此，这项工作为LIDAR和相机提供了基于金字塔的深层融合体系结构，以改善交通场景的3D语义分割。单个传感器骨架提取物具有相机图像和激光点云的地图。一种新型的金字塔融合主链将这些特征图融合在不同尺度上，并将多模式特征结合在特征金字塔中，以计算有价值的多模式，多尺度特征。金字塔融合头汇总了这些金字塔特征，并进一步完善了它们的后期融合步骤，并结合了传感器骨架的最终特征。对两个具有挑战性的室外数据集进行了评估，并研究了不同的融合策略和设置。它的表现优于最近基于范围的激光雷达方法以及迄今为止所有提出的融合策略和体系结构的方法。

Robust environment perception for autonomous vehicles is a tremendous challenge, which makes a diverse sensor set with e.g. camera, lidar and radar crucial. In the process of understanding the recorded sensor data, 3D semantic segmentation plays an important role. Therefore, this work presents a pyramid-based deep fusion architecture for lidar and camera to improve 3D semantic segmentation of traffic scenes. Individual sensor backbones extract feature maps of camera images and lidar point clouds. A novel Pyramid Fusion Backbone fuses these feature maps at different scales and combines the multimodal features in a feature pyramid to compute valuable multimodal, multi-scale features. The Pyramid Fusion Head aggregates these pyramid features and further refines them in a late fusion step, incorporating the final features of the sensor backbones. The approach is evaluated on two challenging outdoor datasets and different fusion strategies and setups are investigated. It outperforms recent range view based lidar approaches as well as all so far proposed fusion strategies and architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题