论文标题

Harl:一种新型的层次对手增强学习,用于汽车交叉路口管理

HARL: A Novel Hierachical Adversary Reinforcement Learning for Automoumous Intersection Management

论文作者

Li, Guanzhou, Wu, Jianping, He, Yujing

论文摘要

作为一项新兴技术,据信,连接的自动驾驶汽车(CAV)具有通过有效的车辆到各种通信(V2X)通信和全球观察的能力以更快,更安全的方式穿过交叉路口。自主交点管理是在交叉点上有效穿越的关键途径,它可以减少不必要的放缓,并通过每个CAV的自适应决策过程停止,从而使交叉空间的更全面利用。分布式增强学习(DRL)提供了一个灵活的端到端模型,以适应许多相交场景。尽管DRL容易出现碰撞,因为复杂互动中多个方面的作用是从通用策略中抽样的,从而限制了DRL在现实情况下的应用。为了解决这个问题,我们提出了一个层次RL框架,其中不同级别的模型在接受范围,动作步长和奖励的反馈期之间有所不同。上层模型加速骑士以防止它们发生冲突,而下层模型则调整了上层模型的趋势,以避免移动状态的变化,从而导致新的冲突。 CAV在每个步骤中的真正作用都由两个级别的趋势共同确定,从而在对抗过程中形成了实时平衡。在复杂的交点与4个分支和每个分支的4个车道进行的复杂交叉点进行了实验中,该模型被证明有效,与基线相比显示出更好的性能。

As an emerging technology, Connected Autonomous Vehicles (CAVs) are believed to have the ability to move through intersections in a faster and safer manner, through effective Vehicle-to-Everything (V2X) communication and global observation. Autonomous intersection management is a key path to efficient crossing at intersections, which reduces unnecessary slowdowns and stops through adaptive decision process of each CAV, enabling fuller utilization of the intersection space. Distributed reinforcement learning (DRL) offers a flexible, end-to-end model for AIM, adapting for many intersection scenarios. While DRL is prone to collisions as the actions of multiple sides in the complicated interactions are sampled from a generic policy, restricting the application of DRL in realistic scenario. To address this, we propose a hierarchical RL framework where models at different levels vary in receptive scope, action step length, and feedback period of reward. The upper layer model accelerate CAVs to prevent them from being clashed, while the lower layer model adjust the trends from upper layer model to avoid the change of mobile state causing new conflicts. And the real action of CAV at each step is co-determined by the trends from both levels, forming a real-time balance in the adversarial process. The proposed model is proven effective in the experiment undertaken in a complicated intersection with 4 branches and 4 lanes each branch, and show better performance compared with baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源