使用SIM到现实转移多机构增强学习政策，以进行自动驾驶

论文标题

使用SIM到现实转移多机构增强学习政策，以进行自动驾驶

Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real

论文作者

Candela, Eduardo, Parada, Leandro, Marques, Luis, Georgescu, Tiberiu-Andrei, Demiris, Yiannis, Angeloudis, Panagiotis

论文摘要

自动驾驶需要高水平的代理之间的协调和协作。在多代理系统中实现有效的协调是一项艰巨的任务，在很大程度上尚未解决。多代理强化学习已成为完成此任务的强大方法，因为它考虑了代理之间的相互作用，并且还允许进行分散的培训 - 这使其具有很高的可扩展性。但是，即使对于单一代理应用程序，将政策从模拟转移到现实世界都是一个巨大的挑战。由于代理协作和环境同步，多代理系统为SIM到现实差距增加了其他复杂性。在本文中，我们提出了一种将多代理自动驾驶政策转移到现实世界的方法。为此，我们创建了一个多代理环境，该环境模仿了Duckietown多机器人测试床的动力学，并使用具有不同级别的域随机化的MAPPO算法来训练多代理策略。然后，我们将受过训练的政策转移到Duckietown Testbend，并将Mappo算法的使用与传统的基于规则的方法进行比较。我们表明，使用MAPPO和域随机化转移的策略的奖励平均是基于规则的方法的1.85倍。此外，我们表明，不同级别的参数随机化对SIM到空间隙具有重大影响。

Autonomous Driving requires high levels of coordination and collaboration between agents. Achieving effective coordination in multi-agent systems is a difficult task that remains largely unresolved. Multi-Agent Reinforcement Learning has arisen as a powerful method to accomplish this task because it considers the interaction between agents and also allows for decentralized training -- which makes it highly scalable. However, transferring policies from simulation to the real world is a big challenge, even for single-agent applications. Multi-agent systems add additional complexities to the Sim-to-Real gap due to agent collaboration and environment synchronization. In this paper, we propose a method to transfer multi-agent autonomous driving policies to the real world. For this, we create a multi-agent environment that imitates the dynamics of the Duckietown multi-robot testbed, and train multi-agent policies using the MAPPO algorithm with different levels of domain randomization. We then transfer the trained policies to the Duckietown testbed and compare the use of the MAPPO algorithm against a traditional rule-based method. We show that the rewards of the transferred policies with MAPPO and domain randomization are, on average, 1.85 times superior to the rule-based method. Moreover, we show that different levels of parameter randomization have a substantial impact on the Sim-to-Real gap.

下载PDF全文

下载文献需遵守相关版权规定

论文标题