单眼视频中的运动学3D对象检测

论文标题

单眼视频中的运动学3D对象检测

Kinematic 3D Object Detection in Monocular Video

论文作者

Brazil, Garrick, Pons-Moll, Gerard, Liu, Xiaoming, Schiele, Bernt

论文摘要

在3D中感知物理世界是自动驾驶应用的基础。尽管时间运动是人类远见，跟踪和深度感知的宝贵资源，但这些特征尚未在现代3D对象检测器中得到彻底利用。在这项工作中，我们提出了一种基于单眼视频的3D对象检测的新方法，该方法仔细利用运动学运动来提高3D定位的精度。具体而言，我们首先提出了一种新颖的对象取向的分解以及自动平衡的3D信心。我们表明，这两个组件对于使我们的运动学模型能够有效工作至关重要。总体而言，我们仅使用单个模型，有效地利用了从单眼视频的3D运动学来提高3D对象检测中的总体定位精度，同时还产生了场景动力学的有用的副产品（EGO-Motion和Per-Object速度）。我们在Kitti自动驾驶数据集中的单眼3D对象检测和鸟类视图任务上实现了最先进的性能。

Perceiving the physical world in 3D is fundamental for self-driving applications. Although temporal motion is an invaluable resource to human vision for detection, tracking, and depth perception, such features have not been thoroughly utilized in modern 3D object detectors. In this work, we propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization. Specifically, we first propose a novel decomposition of object orientation as well as a self-balancing 3D confidence. We show that both components are critical to enable our kinematic model to work effectively. Collectively, using only a single model, we efficiently leverage 3D kinematics from monocular videos to improve the overall localization precision in 3D object detection while also producing useful by-products of scene dynamics (ego-motion and per-object velocity). We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题