通过逆增强学习和蒙特卡洛树搜索的合作轨迹计划的学习奖励模型

论文标题

通过逆增强学习和蒙特卡洛树搜索的合作轨迹计划的学习奖励模型

Learning Reward Models for Cooperative Trajectory Planning with Inverse Reinforcement Learning and Monte Carlo Tree Search

论文作者

Kurzer, Karl, Bitzer, Matthias, Zöllner, J. Marius

论文摘要

自动车辆的合作轨迹计划方法可以解决需要交通参与者之间高度合作的交通情况。但是，要使合作系统集成到以人为本的交通中，自动化系统必须具有类似人类的方式，以便人类可以预见系统的决定。尽管强化学习在解决决策部分方面取得了显着进步，但要参数化产生可预测动作的奖励模型并非平凡。这项工作采用基于功能的最大熵逆增强学习与蒙特卡洛树搜索相结合，以学习奖励模型，从而最大程度地记录了录制的多机构合作专家轨迹的可能性。评估表明，该方法可以恢复合理的奖励模型，该模型模仿专家，并且与手动调整的基线奖励模型相似。

Cooperative trajectory planning methods for automated vehicles can solve traffic scenarios that require a high degree of cooperation between traffic participants. However, for cooperative systems to integrate into human-centered traffic, the automated systems must behave human-like so that humans can anticipate the system's decisions. While Reinforcement Learning has made remarkable progress in solving the decision-making part, it is non-trivial to parameterize a reward model that yields predictable actions. This work employs feature-based Maximum Entropy Inverse Reinforcement Learning combined with Monte Carlo Tree Search to learn reward models that maximize the likelihood of recorded multi-agent cooperative expert trajectories. The evaluation demonstrates that the approach can recover a reasonable reward model that mimics the expert and performs similarly to a manually tuned baseline reward model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题