MPC的时空COSTMAP推断通过深度增强学习

论文标题

MPC的时空COSTMAP推断通过深度增强学习

Spatiotemporal Costmap Inference for MPC via Deep Inverse Reinforcement Learning

论文作者

Lee, Keuntaek, Isele, David, Theodorou, Evangelos A., Bae, Sangjae

论文摘要

很难自主产生驾驶员行为，因此对于其他交通参与者来说似乎很自然。通过逆增强学习（IRL），我们可以通过从人类示范中学习基本奖励功能来自动化这一过程。我们提出了一种新的IRL算法，该算法学习了目标条件的时空奖励功能。模型预测控制器（MPC）使用所得的COSTMAP执行任务，而无需对成本函数进行任何手动设计或手动调整。我们在Carla模拟器中评估了我们提出的目标时空零熵深度IRL（GSTZ）-Medirl框架以及MPC，用于自动驾驶，车道延续和车道在充满挑战的密集交通高速公路方案中更改任务。与其他基线方法相比，我们提出的方法显示出更高的成功率，包括行为克隆，最先进的RL策略和具有基于学习的行为预测模型的MPC。

It can be difficult to autonomously produce driver behavior so that it appears natural to other traffic participants. Through Inverse Reinforcement Learning (IRL), we can automate this process by learning the underlying reward function from human demonstrations. We propose a new IRL algorithm that learns a goal-conditioned spatiotemporal reward function. The resulting costmap is used by Model Predictive Controllers (MPCs) to perform a task without any hand-designing or hand-tuning of the cost function. We evaluate our proposed Goal-conditioned SpatioTemporal Zeroing Maximum Entropy Deep IRL (GSTZ)-MEDIRL framework together with MPC in the CARLA simulator for autonomous driving, lane keeping, and lane changing tasks in a challenging dense traffic highway scenario. Our proposed methods show higher success rates compared to other baseline methods including behavior cloning, state-of-the-art RL policies, and MPC with a learning-based behavior prediction model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题