多相机3D对象检测的简单基线

论文标题

多相机3D对象检测的简单基线

A Simple Baseline for Multi-Camera 3D Object Detection

论文作者

Zhang, Yunpeng, Zheng, Wenzhao, Zhu, Zheng, Huang, Guan, Zhou, Jie, Lu, Jiwen

论文摘要

与周围摄像机的3D对象检测是自动驾驶的有希望的方向。在本文中，我们提出了Simmod，这是一个简单的多相机对象检测的基线，以解决问题。为了合并多视图信息，并基于以前对单眼3D对象检测的努力，该框架建立在样本的对象建议基础上，并旨在以两阶段的方式工作。首先，我们提取多尺度特征，并在每个单眼图像上生成透视对象建议。其次，多视图提案进行了汇总，然后在DETR3D风格中使用多视图和多尺度的视觉特征进行迭代完善。精制的建议端到端解码为检测结果。为了进一步提高性能，我们将辅助分支与提案生成一起结合起来，以增强特征学习。此外，我们设计了目标过滤和教师强迫促进两阶段训练的一致性的方法。我们对Nuscenes的3D对象检测基准进行了广泛的实验，以证明Simmod的有效性并实现新的最新性能。代码将在https://github.com/zhangyp15/simmod上找到。

3D object detection with surrounding cameras has been a promising direction for autonomous driving. In this paper, we present SimMOD, a Simple baseline for Multi-camera Object Detection, to solve the problem. To incorporate multi-view information as well as build upon previous efforts on monocular 3D object detection, the framework is built on sample-wise object proposals and designed to work in a two-stage manner. First, we extract multi-scale features and generate the perspective object proposals on each monocular image. Second, the multi-view proposals are aggregated and then iteratively refined with multi-view and multi-scale visual features in the DETR3D-style. The refined proposals are end-to-end decoded into the detection results. To further boost the performance, we incorporate the auxiliary branches alongside the proposal generation to enhance the feature learning. Also, we design the methods of target filtering and teacher forcing to promote the consistency of two-stage training. We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD and achieve new state-of-the-art performance. Code will be available at https://github.com/zhangyp15/SimMOD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题