Defix：通过基于模仿学习的自主驾驶中的强化学习来检测和固定故障方案

论文标题

Defix：通过基于模仿学习的自主驾驶中的强化学习来检测和固定故障方案

DeFIX: Detecting and Fixing Failure Scenarios with Reinforcement Learning in Imitation Learning Based Autonomous Driving

论文作者

Dagdanov, Resul, Eksen, Feyza, Durmus, Halil, Yurdakul, Ferhat, Ure, Nazim Kemal

论文摘要

在不违反任何交通规则的情况下安全地在城市环境中航行是可靠的自动驾驶的至关重要的绩效目标。在本文中，我们提出了一种基于强化学习（RL）的方法来检测和修复模仿学习（IL）代理的失败（IL）试剂，通过提取违规点并重新构建迷你筛选区域，以培训这些依次区域，以培训RL药物以固定IL方法的缺点。 DeFix是一个连续的学习框架，在该框架中提取故障场景和RL代理的训练是在无限循环中执行的。在对策略库进行培训并添加到策略库之后，策略分类器方法有效地决定了在评估期间在每个步骤中激活的策略。已经证明，即使只有一个在IL代理的失败情况下接受过培训的RL代理，DeFix方法也具有竞争力，要么表现优于最先进的IL和基于RL的自主城市驾驶基准。我们在Carla模拟器的最具挑战性的地图（Town05）上培训并验证了我们的方法，该方法涉及复杂，现实和对抗性驾驶场景。源代码可在https://github.com/data-and-decision-lab/defix上公开获得

Safely navigating through an urban environment without violating any traffic rules is a crucial performance target for reliable autonomous driving. In this paper, we present a Reinforcement Learning (RL) based methodology to DEtect and FIX (DeFIX) failures of an Imitation Learning (IL) agent by extracting infraction spots and re-constructing mini-scenarios on these infraction areas to train an RL agent for fixing the shortcomings of the IL approach. DeFIX is a continuous learning framework, where extraction of failure scenarios and training of RL agents are executed in an infinite loop. After each new policy is trained and added to the library of policies, a policy classifier method effectively decides on which policy to activate at each step during the evaluation. It is demonstrated that even with only one RL agent trained on failure scenario of an IL agent, DeFIX method is either competitive or does outperform state-of-the-art IL and RL based autonomous urban driving benchmarks. We trained and validated our approach on the most challenging map (Town05) of CARLA simulator which involves complex, realistic, and adversarial driving scenarios. The source code is publicly available at https://github.com/data-and-decision-lab/DeFIX

下载PDF全文

下载文献需遵守相关版权规定

论文标题