论文标题
通过关键点优化和SIM到现实传输的机器人操纵器构姿势估计
Pose Estimation for Robot Manipulators via Keypoint Optimization and Sim-to-Real Transfer
论文作者
论文摘要
对于许多机器人应用,例如运动捕获和姿势估计,关键点检测是必不可少的基础。从历史上看,使用唯一工程的标记(例如Checkerboards或Cittucials)检测到关键点。最近,已经探索了深度学习方法,因为它们可以以无标记的方式检测用户定义的关键点。但是,在检测和本地化方面,不同手动选择的关键点的性能可能不平衡。可以在对称机器人工具上找到一个例子,在该工具中,DNN检测器无法正确解决对应问题。在这项工作中,我们提出了一种新的自主方式来定义克服这些挑战的关键点位置。该方法涉及在机器人操纵器上找到最佳的关键点,以进行健壮的视觉检测和定位。使用机器人模拟器作为介质,我们的算法利用合成数据进行DNN训练,并使用拟议的算法通过迭代方法来优化关键点的选择。结果表明,当使用优化的关键点时,DNN的检测性能显着提高。我们通过使用域随机化来弥合模拟器与物理世界之间的现实差距,进一步将优化的关键点用于真实的机器人应用程序。物理世界实验表明,如何将所提出的方法应用于需要视觉反馈的机器人应用的广泛折叠,例如摄像机对机器人校准,机器人工具跟踪和最终效果姿势估计。
Keypoint detection is an essential building block for many robotic applications like motion capture and pose estimation. Historically, keypoints are detected using uniquely engineered markers such as checkerboards or fiducials. More recently, deep learning methods have been explored as they have the ability to detect user-defined keypoints in a marker-less manner. However, different manually selected keypoints can have uneven performance when it comes to detection and localization. An example of this can be found on symmetric robotic tools where DNN detectors cannot solve the correspondence problem correctly. In this work, we propose a new and autonomous way to define the keypoint locations that overcomes these challenges. The approach involves finding the optimal set of keypoints on robotic manipulators for robust visual detection and localization. Using a robotic simulator as a medium, our algorithm utilizes synthetic data for DNN training, and the proposed algorithm is used to optimize the selection of keypoints through an iterative approach. The results show that when using the optimized keypoints, the detection performance of the DNNs improved significantly. We further use the optimized keypoints for real robotic applications by using domain randomization to bridge the reality gap between the simulator and the physical world. The physical world experiments show how the proposed method can be applied to the wide-breadth of robotic applications that require visual feedback, such as camera-to-robot calibration, robotic tool tracking, and end-effector pose estimation.