论文标题
基于图像的对象姿势估计的3D增强对比度知识蒸馏
3D-Augmented Contrastive Knowledge Distillation for Image-based Object Pose Estimation
论文作者
论文摘要
基于图像的对象姿势估计听起来很棒,因为在实际应用中,对象的形状通常不可用,也不容易像照片一样。尽管这在某种程度上是一个优势,但3D视觉学习问题中未探索的形状信息看起来像是“玉中的缺陷”。在本文中,我们在合理的新环境中处理问题,即在训练过程中利用3D形状,并且测试仍然纯粹是基于图像的。我们通过利用通过多模式方法学到的3D知识来增强基于图像的方法的基于图像的方法的性能。具体而言,我们提出了一个新颖的对比知识蒸馏框架,该框架将3D夸大的图像表示从多模式模型转移到基于图像的模型。我们将对比度学习整合到知识蒸馏的两阶段训练过程中,该过程制定了一种先进的解决方案,以结合这两种跨模式任务的方法。与现有的基于类别的图像方法相比,我们通过大幅度(对ObjectNet3D数据集提高 +5%提高)进行了实验报告的最新结果,这证明了我们方法的有效性。
Image-based object pose estimation sounds amazing because in real applications the shape of object is oftentimes not available or not easy to take like photos. Although it is an advantage to some extent, un-explored shape information in 3D vision learning problem looks like "flaws in jade". In this paper, we deal with the problem in a reasonable new setting, namely 3D shape is exploited in the training process, and the testing is still purely image-based. We enhance the performance of image-based methods for category-agnostic object pose estimation by exploiting 3D knowledge learned by a multi-modal method. Specifically, we propose a novel contrastive knowledge distillation framework that effectively transfers 3D-augmented image representation from a multi-modal model to an image-based model. We integrate contrastive learning into the two-stage training procedure of knowledge distillation, which formulates an advanced solution to combine these two approaches for cross-modal tasks. We experimentally report state-of-the-art results compared with existing category-agnostic image-based methods by a large margin (up to +5% improvement on ObjectNet3D dataset), demonstrating the effectiveness of our method.