具有目标参考路径的受约束环境中有效探索

论文标题

具有目标参考路径的受约束环境中有效探索

Efficient Exploration in Constrained Environments with Goal-Oriented Reference Path

论文作者

Ota, Kei, Sasaki, Yoko, Jha, Devesh K., Yoshiyasu, Yusuke, Kanezaki, Asako

论文摘要

在本文中，我们考虑了建立可以有效学习在受约束环境中导航的学习代理的问题。主要目标是设计可以使用高维输入（2D地图）有效地学会理解和推广到不同环境的代理，同时遵循可行的路径，以避免在障碍物整洁的环境中遇到障碍。为了实现这一目标，我们利用传统的路径规划算法，监督学习和增强学习算法，以协同的方式进行。关键的想法是将导航问题解除到计划和控制中，前者是通过监督学习来实现的，而后者是通过强化学习来完成的。具体而言，我们训练一个深卷积网络，该网络可以根据环境地图来预测无碰撞路径 - 然后，这是通过强化学习算法使用的，以学习遵循路径。这使训练有素的代理可以在更快学习的同时实现良好的概括。我们在最近提出的安全健身房套件中测试了我们提出的方法，该套件允许在培训学习剂期间测试安全指标。我们将我们提出的方法与现有工作进行了比较，并表明我们的方法始终提高样本效率和对新环境的概括能力。

In this paper, we consider the problem of building learning agents that can efficiently learn to navigate in constrained environments. The main goal is to design agents that can efficiently learn to understand and generalize to different environments using high-dimensional inputs (a 2D map), while following feasible paths that avoid obstacles in obstacle-cluttered environment. To achieve this, we make use of traditional path planning algorithms, supervised learning, and reinforcement learning algorithms in a synergistic way. The key idea is to decouple the navigation problem into planning and control, the former of which is achieved by supervised learning whereas the latter is done by reinforcement learning. Specifically, we train a deep convolutional network that can predict collision-free paths based on a map of the environment-- this is then used by a reinforcement learning algorithm to learn to closely follow the path. This allows the trained agent to achieve good generalization while learning faster. We test our proposed method in the recently proposed Safety Gym suite that allows testing of safety-constraints during training of learning agents. We compare our proposed method with existing work and show that our method consistently improves the sample efficiency and generalization capability to novel environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题