论文标题
Monstereo:当单眼和立体声在3D人类本地化的尾部相遇时
MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization
论文作者
论文摘要
在自动驾驶汽车或社会机器人的背景下,单眼和立体声视觉是3D人类本地化的成本效益解决方案。但是,它们通常是独立发展的,并具有各自的优势和局限性。我们提出了一个新型的统一学习框架,该框架利用单眼和立体声提示的优势来实现3D人类本地化。我们的方法(i)在左右图像中共同使人类通过依靠单眼提示的鲁棒性来处理立体声环境中遮挡和遥远的病例,(iii)通过利用人类高度的先验知识来利用单眼透视投影的内在歧义。我们通过分析整个误差分布并估算校准的置信区间来特别评估异常值和具有挑战性的实例,例如遮挡和遥远的行人。最后,我们严格审查了官方的Kitti 3D指标,并提出了针对人类量身定制的实用的3D定位指标。
Monocular and stereo visions are cost-effective solutions for 3D human localization in the context of self-driving cars or social robots. However, they are usually developed independently and have their respective strengths and limitations. We propose a novel unified learning framework that leverages the strengths of both monocular and stereo cues for 3D human localization. Our method jointly (i) associates humans in left-right images, (ii) deals with occluded and distant cases in stereo settings by relying on the robustness of monocular cues, and (iii) tackles the intrinsic ambiguity of monocular perspective projection by exploiting prior knowledge of the human height distribution. We specifically evaluate outliers as well as challenging instances, such as occluded and far-away pedestrians, by analyzing the entire error distribution and by estimating calibrated confidence intervals. Finally, we critically review the official KITTI 3D metrics and propose a practical 3D localization metric tailored for humans.