从次优专家学习灵巧的操纵

论文标题

从次优专家学习灵巧的操纵

Learning Dexterous Manipulation from Suboptimal Experts

论文作者

Jeong, Rae, Springenberg, Jost Tobias, Kay, Jackie, Zheng, Daniel, Zhou, Yuxiang, Galashov, Alexandre, Heess, Nicolas, Nori, Francesco

论文摘要

在高维状态行动空间中学习灵巧的操纵是探索带来主要瓶颈的重要开放挑战。尽管在许多情况下，学习过程可以通过演示或其他次优专家来指导，但是当前的连续动作空间RL算法通常无法有效利用高度非政策的专家数据和实质性探索数据的组合。作为解决方案，我们介绍了相对熵Q学习（REQ），这是一种简单的策略迭代算法，结合了成功的离线和常规RL算法的想法。它通过从学先的先验中进行的重要性取样来代表最佳策略，并且非常适合利用混合数据分布。我们在实验上证明，REQ在机器人操纵任务上胜过几个强大的基准，可为这些任务提供次优的专家。我们展示了如何通过组合简单的Waypoint跟踪控制器来有效地构建次优专家，并且我们还展示了如何将学习的原始素与Waypoint Controller结合在一起，以获得参考行为，以在模拟的双手机器人上以人类的手指在模拟的双手机器人上引导复杂的操纵任务。最后，我们表明REQ对示威活动的普通非政策RL，离线RL和RL也有效。可以在sites.google.com/view/rlfse上找到视频和其他材料。

Learning dexterous manipulation in high-dimensional state-action spaces is an important open challenge with exploration presenting a major bottleneck. Although in many cases the learning process could be guided by demonstrations or other suboptimal experts, current RL algorithms for continuous action spaces often fail to effectively utilize combinations of highly off-policy expert data and on-policy exploration data. As a solution, we introduce Relative Entropy Q-Learning (REQ), a simple policy iteration algorithm that combines ideas from successful offline and conventional RL algorithms. It represents the optimal policy via importance sampling from a learned prior and is well-suited to take advantage of mixed data distributions. We demonstrate experimentally that REQ outperforms several strong baselines on robotic manipulation tasks for which suboptimal experts are available. We show how suboptimal experts can be constructed effectively by composing simple waypoint tracking controllers, and we also show how learned primitives can be combined with waypoint controllers to obtain reference behaviors to bootstrap a complex manipulation task on a simulated bimanual robot with human-like hands. Finally, we show that REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations. Videos and further materials are available at sites.google.com/view/rlfse.

下载PDF全文

下载文献需遵守相关版权规定

论文标题