论文标题
Dexvip:从视频中学习灵巧的抓手
DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video
论文作者
论文摘要
灵巧的多指机器人手具有强大的动作空间,但是它们与人手的形态相似性具有巨大的潜力,可以加速机器人学习。我们提出了Dexvip,这是一种从YouTube视频中存在的人类对象相互作用中学习灵巧的机器人抓握的方法。我们通过策划人类对象交互视频中的掌握图像,并在学习掌握深度强化学习时对代理人的手姿势施加先验。我们方法的关键优势是学到的策略能够利用自由形式的野外视觉数据。结果,它可以轻松地扩展到新对象,并避开在实验室中收集人类示威的标准实践,这是一种捕获人类专业知识的更昂贵和间接的方式。通过对27个具有30多个模拟机器人手的对象进行的实验,我们证明了Dexvip与缺乏手姿势的现有方法相比,或者依靠专门的电信设备来获得人体示范,同时也更快地进行训练。项目页面:https://vision.cs.utexas.edu/projects/dexvip-dexterous-grasp-pose-poser-pose
Dexterous multi-fingered robotic hands have a formidable action space, yet their morphological similarity to the human hand holds immense potential to accelerate robot learning. We propose DexVIP, an approach to learn dexterous robotic grasping from human-object interactions present in in-the-wild YouTube videos. We do this by curating grasp images from human-object interaction videos and imposing a prior over the agent's hand pose when learning to grasp with deep reinforcement learning. A key advantage of our method is that the learned policy is able to leverage free-form in-the-wild visual data. As a result, it can easily scale to new objects, and it sidesteps the standard practice of collecting human demonstrations in a lab -- a much more expensive and indirect way to capture human expertise. Through experiments on 27 objects with a 30-DoF simulated robot hand, we demonstrate that DexVIP compares favorably to existing approaches that lack a hand pose prior or rely on specialized tele-operation equipment to obtain human demonstrations, while also being faster to train. Project page: https://vision.cs.utexas.edu/projects/dexvip-dexterous-grasp-pose-prior