SAFERL-KIT：评估安全自动驾驶的有效加强学习方法

论文标题

SAFERL-KIT：评估安全自动驾驶的有效加强学习方法

SafeRL-Kit: Evaluating Efficient Reinforcement Learning Methods for Safe Autonomous Driving

论文作者

Zhang, Linrui, Zhang, Qin, Shen, Li, Yuan, Bo, Wang, Xueqian

论文摘要

安全加强学习（RL）在对风险敏感的任务上取得了重大成功，并在自动驾驶方面也表现出了希望（AD）。考虑到这个社区的独特性，对于安全广告而言，仍然缺乏高效且可再现的基线。在本文中，我们将SAFERL-KIT释放到基准的安全RL方法，以实现倾向的任务。具体而言，Saferl-kit包含了针对零构成的侵略任务的几种最新算法，包括安全层，恢复RL，非政策lagrangian方法和可行的Actor-Critic。除了现有方法外，我们还提出了一种名为“精确惩罚优化”（EPO）的新型一阶方法，并充分证明了其在安全AD中的能力。 SAFERL-KIT中的所有算法均已在非政策设置下实现（i），从而提高了样本效率并可以更好地利用过去的日志；（ii）使用统一的学习框架，为研究人员提供了现成的接口，将其特定领域的知识纳入基本的安全RL方法中。结论性地，我们对上述算法进行比较评估，并阐明了它们的安全自动驾驶功效。源代码可在\ href {https://github.com/zlr20/saferl_kit} {thishis https url}中获得。

Safe reinforcement learning (RL) has achieved significant success on risk-sensitive tasks and shown promise in autonomous driving (AD) as well. Considering the distinctiveness of this community, efficient and reproducible baselines are still lacking for safe AD. In this paper, we release SafeRL-Kit to benchmark safe RL methods for AD-oriented tasks. Concretely, SafeRL-Kit contains several latest algorithms specific to zero-constraint-violation tasks, including Safety Layer, Recovery RL, off-policy Lagrangian method, and Feasible Actor-Critic. In addition to existing approaches, we propose a novel first-order method named Exact Penalty Optimization (EPO) and sufficiently demonstrate its capability in safe AD. All algorithms in SafeRL-Kit are implemented (i) under the off-policy setting, which improves sample efficiency and can better leverage past logs; (ii) with a unified learning framework, providing off-the-shelf interfaces for researchers to incorporate their domain-specific knowledge into fundamental safe RL methods. Conclusively, we conduct a comparative evaluation of the above algorithms in SafeRL-Kit and shed light on their efficacy for safe autonomous driving. The source code is available at \href{ https://github.com/zlr20/saferl_kit}{this https URL}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题