论文标题
SIM2REAL以对象为中心的关键点检测和描述
Sim2Real Object-Centric Keypoint Detection and Description
论文作者
论文摘要
关键点检测和描述在计算机视觉中起着核心作用。大多数现有的方法都是场景级预测的形式,而无需返回不同关键点的对象类。在本文中,我们提出了以对象为中心的公式,除了传统环境之外,还需要进一步确定每个兴趣点所属的对象。有了这样的细粒度信息,我们的框架可以使更多的下游电势,例如在聚集环境中进行对象级匹配和姿势估计。为了解决现实世界中标签收集的困难,我们开发了一种SIM2REAL对比度学习机制,该机制可以将模拟中训练的模型推广到现实世界应用程序。我们的培训方法的新颖性是三个方面:(i)我们将不确定性整合到学习框架中,以改善硬案例的特征描述,例如纹理较少或对称的补丁; (ii)我们将对象描述符分解为两个输出分支 - 内部对象的显着性和对象间的独特性,从而获得更好的像素描述; (iii)我们强制执行跨视图的语义一致性,以增强表示学习的鲁棒性。关于图像匹配和6D构成估计的全面实验验证了我们方法从模拟到现实的鼓励的概括能力。特别是对于6D姿势估计,我们的方法显着优于典型的无监督/SIM2Real方法,与完全监督的对应物达到了更紧密的差距。可以在https://zhongcl-thu.github.io/rock/上找到其他结果和视频
Keypoint detection and description play a central role in computer vision. Most existing methods are in the form of scene-level prediction, without returning the object classes of different keypoints. In this paper, we propose the object-centric formulation, which, beyond the conventional setting, requires further identifying which object each interest point belongs to. With such fine-grained information, our framework enables more downstream potentials, such as object-level matching and pose estimation in a clustered environment. To get around the difficulty of label collection in the real world, we develop a sim2real contrastive learning mechanism that can generalize the model trained in simulation to real-world applications. The novelties of our training method are three-fold: (i) we integrate the uncertainty into the learning framework to improve feature description of hard cases, e.g., less-textured or symmetric patches; (ii) we decouple the object descriptor into two output branches -- intra-object salience and inter-object distinctness, resulting in a better pixel-wise description; (iii) we enforce cross-view semantic consistency for enhanced robustness in representation learning. Comprehensive experiments on image matching and 6D pose estimation verify the encouraging generalization ability of our method from simulation to reality. Particularly for 6D pose estimation, our method significantly outperforms typical unsupervised/sim2real methods, achieving a closer gap with the fully supervised counterpart. Additional results and videos can be found at https://zhongcl-thu.github.io/rock/