通过内在动机的元控制和潜在空间体验想象力改善机器人双系统运动学习

论文标题

通过内在动机的元控制和潜在空间体验想象力改善机器人双系统运动学习

Improving Robot Dual-System Motor Learning with Intrinsically Motivated Meta-Control and Latent-Space Experience Imagination

论文作者

Hafez, Muhammad Burhan, Weber, Cornelius, Kerzel, Matthias, Wermter, Stefan

论文摘要

结合基于模型和无模型的学习系统可以提高学习的样本效率，以执行复杂的机器人任务。但是，双层系统方法在应用多个步骤预测时无法考虑该模型的可靠性，从而导致预测错误和性能降解的复合。在本文中，我们提出了一种新颖的双系统运动学习方法，其中元控制器基于对学习模型的局部可靠性的估计，在基于模型和无模型的决策之间在线仲裁。可靠性估算用于计算固有反馈信号，鼓励行动导致改进模型的数据。我们的方法还将仲裁与想象力整合在一起，其中学识渊博的潜在空间模型会根据其局部可靠性产生想象中的体验，以用作其他培训数据。我们对基线和最先进的方法评估了基于学习视觉的机器人抓握和现实世界中的方法。结果表明，我们的方法的表现优于比较方法，并在浓密和稀疏的奖励环境中学习了近乎最佳的掌握策略。

Combining model-based and model-free learning systems has been shown to improve the sample efficiency of learning to perform complex robotic tasks. However, dual-system approaches fail to consider the reliability of the learned model when it is applied to make multiple-step predictions, resulting in a compounding of prediction errors and performance degradation. In this paper, we present a novel dual-system motor learning approach where a meta-controller arbitrates online between model-based and model-free decisions based on an estimate of the local reliability of the learned model. The reliability estimate is used in computing an intrinsic feedback signal, encouraging actions that lead to data that improves the model. Our approach also integrates arbitration with imagination where a learned latent-space model generates imagined experiences, based on its local reliability, to be used as additional training data. We evaluate our approach against baseline and state-of-the-art methods on learning vision-based robotic grasping in simulation and real world. The results show that our approach outperforms the compared methods and learns near-optimal grasping policies in dense- and sparse-reward environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题