CPS ++：从单眼图像中提高级别的6D姿势和形状估计

论文标题

CPS ++：从单眼图像中提高级别的6D姿势和形状估计

CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular Images With Self-Supervised Learning

论文作者

Manhardt, Fabian, Wang, Gu, Busam, Benjamin, Nickel, Manuel, Meier, Sven, Minciullo, Luca, Ji, Xiangyang, Navab, Nassir

论文摘要

当代单眼6D姿势估计方法只能与少数对象实例相处。这自然会阻碍可能的应用程序，例如，在日常过程中无缝集成的机器人必然需要与数百个不同对象一起使用的能力。为了解决这个内在实践相关性的问题，我们提出了一种新的方法，用于类单程6D姿势估计，并与度量形状检索相结合。不幸的是，获取足够的注释是非常耗时的，劳动力很大。对于类级别6D姿势估计尤其如此，因为需要为所有对象创建高度详细的重建，然后使用这些模型注释每个对象和场景。为了克服这一缺点，我们还提出了通过自我监督的学习构成合成到现实的域转移的想法，这消除了收集众多手动注释的负担。从本质上讲，在培训了我们提出的合成数据完全监督的方法之后，我们利用了可区分渲染的最新进展，以使用未注释的实际RGB-D数据自我避免模型，以改善后者的推断。我们通过实验证明，我们可以从单个RGB图像中检索精确的6D姿势和度量形状。

Contemporary monocular 6D pose estimation methods can only cope with a handful of object instances. This naturally hampers possible applications as, for instance, robots seamlessly integrated in everyday processes necessarily require the ability to work with hundreds of different objects. To tackle this problem of immanent practical relevance, we propose a novel method for class-level monocular 6D pose estimation, coupled with metric shape retrieval. Unfortunately, acquiring adequate annotations is very time-consuming and labor intensive. This is especially true for class-level 6D pose estimation, as one is required to create a highly detailed reconstruction for all objects and then annotate each object and scene using these models. To overcome this shortcoming, we additionally propose the idea of synthetic-to-real domain transfer for class-level 6D poses by means of self-supervised learning, which removes the burden of collecting numerous manual annotations. In essence, after training our proposed method fully supervised with synthetic data, we leverage recent advances in differentiable rendering to self-supervise the model with unannotated real RGB-D data to improve latter inference. We experimentally demonstrate that we can retrieve precise 6D poses and metric shapes from a single RGB image.

下载PDF全文

下载文献需遵守相关版权规定

论文标题