具有选择性多级安全数据集聚合的自动驾驶汽车的样品有效的端到端深度学习

论文标题

具有选择性多级安全数据集聚合的自动驾驶汽车的样品有效的端到端深度学习

Sample Efficient Interactive End-to-End Deep Learning for Self-Driving Cars with Selective Multi-Class Safe Dataset Aggregation

论文作者

Bicer, Yunus, Alizadeh, Ali, Ure, Nazim Kemal, Erdogan, Ahmetcan, Kizilirmak, Orkun

论文摘要

本文的目的是为自动驾驶汽车开发一种有效的端到端深度学习方法，我们试图通过从每个呼叫到专家驱动程序策略中获得的仔细分析来提高从样本中提取的信息的价值。端到端的模仿学习是计算自动驾驶汽车政策的流行方法。标准方法依赖于从专家策略中收集成对的输入（相机图像）和输出（转向角度等），并将深度神经网络拟合到此数据以了解驾驶策略。尽管这种方法过去有一些成功的演示，但是学习一项良好的政策可能需要专家驾驶员的大量样本，这可能是资源消耗的。在这项工作中，我们基于安全的日期集合（安全匕首）方法开发了一个新颖的框架，在该方法中，当前学习的策略会自动分为不同的轨迹类别，并且该算法确定轨迹段或类别在每个步骤中的性能较弱。一旦确定了性能较弱的轨迹细分市场，采样算法就集中在仅在这些细分市场上调用专家政策，从而提高收敛速率。提出的仿真结果表明，与标准的安全匕首算法相比，所提出的方法可以产生明显更好的性能，同时使用专家的样本相同。

The objective of this paper is to develop a sample efficient end-to-end deep learning method for self-driving cars, where we attempt to increase the value of the information extracted from samples, through careful analysis obtained from each call to expert driverś policy. End-to-end imitation learning is a popular method for computing self-driving car policies. The standard approach relies on collecting pairs of inputs (camera images) and outputs (steering angle, etc.) from an expert policy and fitting a deep neural network to this data to learn the driving policy. Although this approach had some successful demonstrations in the past, learning a good policy might require a lot of samples from the expert driver, which might be resource-consuming. In this work, we develop a novel framework based on the Safe Dateset Aggregation (safe DAgger) approach, where the current learned policy is automatically segmented into different trajectory classes, and the algorithm identifies trajectory segments or classes with the weak performance at each step. Once the trajectory segments with weak performance identified, the sampling algorithm focuses on calling the expert policy only on these segments, which improves the convergence rate. The presented simulation results show that the proposed approach can yield significantly better performance compared to the standard Safe DAgger algorithm while using the same amount of samples from the expert.

下载PDF全文

下载文献需遵守相关版权规定

论文标题