Nerf-Supervision：从神经辐射场中学习密集的对象描述符

论文标题

Nerf-Supervision：从神经辐射场中学习密集的对象描述符

NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields

论文作者

Yen-Chen, Lin, Florence, Pete, Barron, Jonathan T., Lin, Tsung-Yi, Rodriguez, Alberto, Isola, Phillip

论文摘要

在我们的日常生活中，较薄的反射性物体（例如叉子和质量）很常见，但是对于机器人感知而言，它们尤其具有挑战性，因为很难使用商品RGB-D摄像头或多视图立体声技术重建它们。尽管传统的管道与此类物体斗争，但最近已证明神经辐射场（NERF）对于在具有薄结构或反射材料的物体上进行视图合成非常有效。在本文中，我们探讨了NERF作为强大机器人视觉系统的新监督来源。特别是，我们证明了场景的NERF表示形式可用于训练密集的对象描述符。我们使用优化的NERF在对象的多个视图之间提取密集的对应关系，然后将这些对应关系用作培训数据以学习对象的视图不变表示。 NERF对密度领域的使用使我们能够通过新颖的深度配方进行重新制定对应问题，而不是使用深度图的常规方法。用我们的方法监督的密集通讯模型极大地超过了现成的学习描述符106％（PCK@3PX度量，性能增加了一倍以上），并以多视图立体声监督的基线优于29％。此外，我们演示了学到的密集描述符，使机器人能够执行精确的6度自由度（6-DOF）拾音器和薄和反光对象的位置。

Thin, reflective objects such as forks and whisks are common in our daily lives, but they are particularly challenging for robot perception because it is hard to reconstruct them using commodity RGB-D cameras or multi-view stereo techniques. While traditional pipelines struggle with objects like these, Neural Radiance Fields (NeRFs) have recently been shown to be remarkably effective for performing view synthesis on objects with thin structures or reflective materials. In this paper we explore the use of NeRF as a new source of supervision for robust robot vision systems. In particular, we demonstrate that a NeRF representation of a scene can be used to train dense object descriptors. We use an optimized NeRF to extract dense correspondences between multiple views of an object, and then use these correspondences as training data for learning a view-invariant representation of the object. NeRF's usage of a density field allows us to reformulate the correspondence problem with a novel distribution-of-depths formulation, as opposed to the conventional approach of using a depth map. Dense correspondence models supervised with our method significantly outperform off-the-shelf learned descriptors by 106% (PCK@3px metric, more than doubling performance) and outperform our baseline supervised with multi-view stereo by 29%. Furthermore, we demonstrate the learned dense descriptors enable robots to perform accurate 6-degree of freedom (6-DoF) pick and place of thin and reflective objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题