学习连续领域的时间扩展技能作为计划的符号行动

论文标题

学习连续领域的时间扩展技能作为计划的符号行动

Learning Temporally Extended Skills in Continuous Domains as Symbolic Actions for Planning

论文作者

Achterhold, Jan, Krimmel, Markus, Stueckler, Joerg

论文摘要

需要长马计划和持续控制能力的问题对现有的强化学习剂构成了重大挑战。在本文中，我们介绍了一种新型的分层增强学习代理，该学习代理将延时的技能与持续控制的技能与远期模型联系起来，以象征性的离散抽象的环境状态进行计划。我们认为我们的代理商seads具有符号效应的多样化技能。我们制定了一种客观且相应的算法，该算法通过已知的摘要，通过内在动机无监督学习各种技能。这些技能是通过符号前向模型共同学习的，该模型捕获了国家抽象中技能执行的影响。训练后，我们可以使用向前模型来利用象征性的动作来实现长马计划，并随后使用学习的连续行动控制技能执行计划。拟议的算法学习技能和前进模型，可用于解决复杂的任务，这些任务需要持续控制和长效计划功能，并具有很高的成功率。它可以与其他平坦和分层的增强学习基线代理进行比较，并通过真正的机器人成功证明。

Problems which require both long-horizon planning and continuous control capabilities pose significant challenges to existing reinforcement learning agents. In this paper we introduce a novel hierarchical reinforcement learning agent which links temporally extended skills for continuous control with a forward model in a symbolic discrete abstraction of the environment's state for planning. We term our agent SEADS for Symbolic Effect-Aware Diverse Skills. We formulate an objective and corresponding algorithm which leads to unsupervised learning of a diverse set of skills through intrinsic motivation given a known state abstraction. The skills are jointly learned with the symbolic forward model which captures the effect of skill execution in the state abstraction. After training, we can leverage the skills as symbolic actions using the forward model for long-horizon planning and subsequently execute the plan using the learned continuous-action control skills. The proposed algorithm learns skills and forward models that can be used to solve complex tasks which require both continuous control and long-horizon planning capabilities with high success rate. It compares favorably with other flat and hierarchical reinforcement learning baseline agents and is successfully demonstrated with a real robot.

下载PDF全文

下载文献需遵守相关版权规定

论文标题