紧急行动终止，以立即在分层增强学习中反应

论文标题

紧急行动终止，以立即在分层增强学习中反应

Emergency action termination for immediate reaction in hierarchical reinforcement learning

论文作者

Bortkiewicz, Michał, Łyskawa, Jakub, Wawrzyński, Paweł, Ostaszewski, Mateusz, Grudkowski, Artur, Trzciński, Tomasz

论文摘要

在大型动力学系统中，控制的层次分解是不可避免的。在强化学习（RL）中，通常通过在较高的政策水平定义并在较低的政策水平上实现的子目标解决。达到这些目标可能需要大量时间，在此期间未验证它们是否仍然值得追求。但是，由于环境的随机性，这些目标可能变得过时了。在本文中，我们在最先进的方法中解决了这一差距，并提出了一种方法，其中高级行动的有效性（从而在较高级别上都经常验证。如果行动（即较低级别的目标）变得不足，则将其取代更合适的目标。这样，我们将层次RL的优势（即快速训练）和Flat RL结合在一起，即直接反应性。我们在七个基准环境上通过实验研究我们的方法。

Hierarchical decomposition of control is unavoidable in large dynamical systems. In reinforcement learning (RL), it is usually solved with subgoals defined at higher policy levels and achieved at lower policy levels. Reaching these goals can take a substantial amount of time, during which it is not verified whether they are still worth pursuing. However, due to the randomness of the environment, these goals may become obsolete. In this paper, we address this gap in the state-of-the-art approaches and propose a method in which the validity of higher-level actions (thus lower-level goals) is constantly verified at the higher level. If the actions, i.e. lower level goals, become inadequate, they are replaced by more appropriate ones. This way we combine the advantages of hierarchical RL, which is fast training, and flat RL, which is immediate reactivity. We study our approach experimentally on seven benchmark environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题