用于自动驾驶决策的分层加强学习，而无需依赖标签的驾驶数据

论文标题

用于自动驾驶决策的分层加强学习，而无需依赖标签的驾驶数据

Hierarchical Reinforcement Learning for Self-Driving Decision-Making without Reliance on Labeled Driving Data

论文作者

Duan, Jingliang, Li, Shengbo Eben, Guan, Yang, Sun, Qi, Cheng, Bo

论文摘要

通常，使用监督的学习技术从驾驶员行为中手动编码规则或模仿驾驶员操纵的规则来解决自动驾驶汽车的决策。他们俩都依靠大规模驾驶数据来涵盖所有可能的驾驶情况。本文提出了一种用于自动驾驶汽车决策的分层增强学习方法，这不取决于大量标记的驾驶数据。该方法全面考虑了侧向和纵向方向上的高级操纵选择和低级运动控制。首先，我们将驾驶任务分解为三个操作，包括在车道上驾驶，右车道更换和左车道更换，并学习每个操纵的子政策。然后，学会了一项主策略，以选择要在当前状态执行的机动策略。所有政策在内的所有政策和机动策略都由完全连接的神经网络代表，并使用异步并行加固学习者（APRL）进行培训，该学习者（APRL）构建了从感官输出到驱动决策的映射。为每个操作设计了不同的状态空间和奖励功能。我们将此方法应用于公路驾驶场景，这表明它可以实现自动驾驶汽车的平稳而安全的决策。

Decision making for self-driving cars is usually tackled by manually encoding rules from drivers' behaviors or imitating drivers' manipulation using supervised learning techniques. Both of them rely on mass driving data to cover all possible driving scenarios. This paper presents a hierarchical reinforcement learning method for decision making of self-driving cars, which does not depend on a large amount of labeled driving data. This method comprehensively considers both high-level maneuver selection and low-level motion control in both lateral and longitudinal directions. We firstly decompose the driving tasks into three maneuvers, including driving in lane, right lane change and left lane change, and learn the sub-policy for each maneuver. Then, a master policy is learned to choose the maneuver policy to be executed in the current state. All policies including master policy and maneuver policies are represented by fully-connected neural networks and trained by using asynchronous parallel reinforcement learners (APRL), which builds a mapping from the sensory outputs to driving decisions. Different state spaces and reward functions are designed for each maneuver. We apply this method to a highway driving scenario, which demonstrates that it can realize smooth and safe decision making for self-driving cars.

下载PDF全文

下载文献需遵守相关版权规定

论文标题