负担得起从游戏中学习样本效率的政策学习

论文标题

负担得起从游戏中学习样本效率的政策学习

Affordance Learning from Play for Sample-Efficient Policy Learning

论文作者

Borja-Diaz, Jessica, Mees, Oier, Kalweit, Gabriel, Hermann, Lukas, Boedecker, Joschka, Burgard, Wolfram

论文摘要

在以人为中心的环境中运行的机器人应该具有了解对象的功能的能力：每个对象可以做什么，可能发生这种交互的地方以及如何使用对象实现目标。为此，我们提出了一种新颖的方法，该方法从人类的teletererate Play数据中提取自我监督的视觉负担能力模型，并利用它来实现有效的政策学习和运动计划。我们将基于模型的计划与无模型的深入强化学习（RL）相结合，以学习有利于人们偏爱相同对象区域的政策，同时需要与环境相互作用最少。我们评估了我们的算法，视觉负担者指导的政策优化（VAPO），既有不同的模拟操纵任务，又是现实世界机器人整洁的实验，以证明我们负担得起的指导政策的有效性。我们发现，我们的政策比基线更快地训练4倍，并将其推广到新颖的物体，因为我们的视觉负担模型可以预见他们的负担能力区域。

Robots operating in human-centered environments should have the ability to understand how objects function: what can be done with each object, where this interaction may occur, and how the object is used to achieve a goal. To this end, we propose a novel approach that extracts a self-supervised visual affordance model from human teleoperated play data and leverages it to enable efficient policy learning and motion planning. We combine model-based planning with model-free deep reinforcement learning (RL) to learn policies that favor the same object regions favored by people, while requiring minimal robot interactions with the environment. We evaluate our algorithm, Visual Affordance-guided Policy Optimization (VAPO), with both diverse simulation manipulation tasks and real world robot tidy-up experiments to demonstrate the effectiveness of our affordance-guided policies. We find that our policies train 4x faster than the baselines and generalize better to novel objects because our visual affordance model can anticipate their affordance regions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题