通过遗传课程进行强大的加固学习

论文标题

通过遗传课程进行强大的加固学习

Robust Reinforcement Learning via Genetic Curriculum

论文作者

Song, Yeeho, Schneider, Jeff

论文摘要

在安全关键系统中应用深度加固学习（RL）时，实现稳健的性能至关重要。一些艺术的方法试图用对抗者来解决问题，但是这些代理通常需要专家监督来微调并防止对手变得过于挑战，对受训者的代理人过于挑战。尽管其他方法涉及在训练过程中自动调整环境设置，但它们仅限于可以使用低维编码的简单环境。受这些方法的启发，我们提出了遗传课程，该算法会自动识别代理当前失败并生成关联的课程的情况，以帮助代理人学习解决方案并获得更强大的行为。作为非参数优化器，我们的方法使用了方案的原始，非固定的编码，减少了对专家监督的需求，并允许我们的算法适应代理商的不断变化的性能。我们的实证研究表明，对现有的最先进算法的鲁棒性有所改善，从而提供了培训课程，导致代理人在不牺牲累积奖励的情况下失败的可能性降低了2-8倍。我们包括一项消融研究，并分享有关为什么我们的算法表现优于先验方法的见解。

Achieving robust performance is crucial when applying deep reinforcement learning (RL) in safety critical systems. Some of the state of the art approaches try to address the problem with adversarial agents, but these agents often require expert supervision to fine tune and prevent the adversary from becoming too challenging to the trainee agent. While other approaches involve automatically adjusting environment setups during training, they have been limited to simple environments where low-dimensional encodings can be used. Inspired by these approaches, we propose genetic curriculum, an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum to help the agent learn to solve the scenarios and acquire more robust behaviors. As a non-parametric optimizer, our approach uses a raw, non-fixed encoding of scenarios, reducing the need for expert supervision and allowing our algorithm to adapt to the changing performance of the agent. Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail without sacrificing cumulative reward. We include an ablation study and share insights on why our algorithm outperforms prior approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题