HUM3DIL：半监督的多模式3D人姿势估计自动驾驶

论文标题

HUM3DIL：半监督的多模式3D人姿势估计自动驾驶

HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving

论文作者

Zanfir, Andrei, Zanfir, Mihai, Gorban, Alexander, Ji, Jingwei, Zhou, Yin, Anguelov, Dragomir, Sminchisescu, Cristian

论文摘要

自主驾驶是一个令人兴奋的新行业，提出了重要的研究问题。在感知模块中，3D人类姿势估计是一项新兴技术，它可以使自动驾驶汽车能够感知和了解行人的微妙而复杂的行为。尽管硬件系统和传感器在数十年中已经显着改善 - 汽车可能具有复杂的激光雷达和视觉系统，并且越来越多地扩展了这些新可用信息的专用数据集的可用机构 - 并没有做很多工作来利用这些新颖的信号来解决3D人类姿势估计的核心问题。我们以半监督的方式高效地利用了这些互补信号，我们的方法有效地利用了这些互补信号，并优于较大余量的现有方法。这是用于机载部署的快速而紧凑的模型。具体而言，我们将LIDAR点嵌入与像素对齐的多模式特征中，我们通过了一系列变压器改进阶段。 Waymo Open数据集中的定量实验支持这些主张，在其中我们实现了3D姿势估计任务的最新结果。

Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information -- not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题