论文标题
语言的符号:从人类机器人互动的示威中,具体的手语手指手指销售
Signs of Language: Embodied Sign Language Fingerspelling Acquisition from Demonstrations for Human-Robot Interaction
论文作者
论文摘要
在机器人技术中学习细粒度的动作是一个具有挑战性的话题,尤其是在机器人手的背景下。这一挑战的一个具体实例是在机器人中获取手指手语。在本文中,我们提出了一种从视频示例中学习灵巧的运动模仿的方法,而无需其他信息。为了实现这一目标,我们首先为每个关节构建了一个机器人手的乌尔德型模型。然后,我们利用预先训练的深视力模型从RGB视频中提取手的3D姿势。接下来,使用最新的强化学习算法进行运动模仿(即近端政策优化和软演员 - 批评),我们训练一项政策,以重现从演示中提取的运动。我们根据参考运动确定用于模仿的最佳超参数集。最后,我们通过对六个不同的任务进行测试,证明了方法的普遍性,与手指的字母相对应。我们的结果表明,我们的方法能够成功模仿这些细粒度的动作而没有其他信息,从而突出了其在机器人技术中的现实应用程序的潜力。
Learning fine-grained movements is a challenging topic in robotics, particularly in the context of robotic hands. One specific instance of this challenge is the acquisition of fingerspelling sign language in robots. In this paper, we propose an approach for learning dexterous motor imitation from video examples without additional information. To achieve this, we first build a URDF model of a robotic hand with a single actuator for each joint. We then leverage pre-trained deep vision models to extract the 3D pose of the hand from RGB videos. Next, using state-of-the-art reinforcement learning algorithms for motion imitation (namely, proximal policy optimization and soft actor-critic), we train a policy to reproduce the movement extracted from the demonstrations. We identify the optimal set of hyperparameters for imitation based on a reference motion. Finally, we demonstrate the generalizability of our approach by testing it on six different tasks, corresponding to fingerspelled letters. Our results show that our approach is able to successfully imitate these fine-grained movements without additional information, highlighting its potential for real-world applications in robotics.