双边深钢筋学习方法，用于型号超过人类汽车

论文标题

双边深钢筋学习方法，用于型号超过人类汽车

Bilateral Deep Reinforcement Learning Approach for Better-than-human Car Following Model

论文作者

Shi, Tianyu, Ai, Yifei, ElSamadisy, Omar, Abdulhai, Baher

论文摘要

在未来几年和几十年中，自动驾驶汽车（AV）将变得越来越普遍，为更安全，更便捷的旅行提供了新的机会，并可能利用自动化和连接性的更智能的交通控制方法。汽车以下是自动驾驶中的主要功能。近年来，基于强化学习的汽车以增强学习为目标，其目标是学习和达到与人类相当的绩效水平。但是，大多数现有的RL方法都将汽车模型为单方面问题，仅感知前方的车辆。然而，最近的文献王（Wang and Horn）[16]表明，遵循的双边汽车考虑了前方的车辆，而背后的车辆则表现出更好的系统稳定性。在本文中，我们假设可以使用RL学习这款双边汽车，同时学习其他目标，例如效率最大化，混蛋最小化和安全奖励，从而导致学习模型，从而超过了人类驾驶。我们通过将双边信息集成到基于双边控制模型（BCM）的CAR中，用于控制和奖励功能，并提出并介绍了遵循控制遵循控制的深度加固学习框架（DRL）框架。此外，我们使用分散的多代理增强学习框架来为每个代理生成相应的控制动作。我们的仿真结果表明，从（a）车辆间的前进方向，（b）平均速度，（c）混蛋，（d）碰撞时间（TTC）和（e）字符串稳定性方面，我们学到的政策比人类驾驶政策更好。

In the coming years and decades, autonomous vehicles (AVs) will become increasingly prevalent, offering new opportunities for safer and more convenient travel and potentially smarter traffic control methods exploiting automation and connectivity. Car following is a prime function in autonomous driving. Car following based on reinforcement learning has received attention in recent years with the goal of learning and achieving performance levels comparable to humans. However, most existing RL methods model car following as a unilateral problem, sensing only the vehicle ahead. Recent literature, however, Wang and Horn [16] has shown that bilateral car following that considers the vehicle ahead and the vehicle behind exhibits better system stability. In this paper we hypothesize that this bilateral car following can be learned using RL, while learning other goals such as efficiency maximisation, jerk minimization, and safety rewards leading to a learned model that outperforms human driving. We propose and introduce a Deep Reinforcement Learning (DRL) framework for car following control by integrating bilateral information into both state and reward function based on the bilateral control model (BCM) for car following control. Furthermore, we use a decentralized multi-agent reinforcement learning framework to generate the corresponding control action for each agent. Our simulation results demonstrate that our learned policy is better than the human driving policy in terms of (a) inter-vehicle headways, (b) average speed, (c) jerk, (d) Time to Collision (TTC) and (e) string stability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题