离线加固学习道路交通管制

论文标题

离线加固学习道路交通管制

Offline Reinforcement Learning for Road Traffic Control

论文作者

Kunjir, Mayuresh, Chawla, Sanjay

论文摘要

交通信号控制是城市流动性的重要问题，具有经济和环境影响的巨大潜力。尽管对交通信号控制的强化学习（RL）越来越兴趣，但迄今为止，这项工作集中在通过模拟中学习，这可能导致由于简化的假设而导致不准确。取而代之的是，可以提供有关流量的真实经验数据，并且可以以最低的成本利用。离线或批处理RL的最新进展已实现了这一点。尤其是基于模型的离线RL方法，已经证明从体验数据中概括比其他方法更好。我们构建了一个基于模型的学习框架，该框架从使用环状交通信号控制策略收集的数据集中渗透了马尔可夫决策过程（MDP），该策略既常见又易于收集。 MDP以悲观的成本建立，用于使用自适应奖励的自适应形状来管理分布外情景，与Pac-optimal相比，与先前的相关工作相比，该奖励可以提供更好的正则化。我们的模型对复杂的信号回旋处进行了评估，表明可以以数据有效的方式构建高性能的交通控制策略。

Traffic signal control is an important problem in urban mobility with a significant potential of economic and environmental impact. While there is a growing interest in Reinforcement Learning (RL) for traffic signal control, the work so far has focussed on learning through simulations which could lead to inaccuracies due to simplifying assumptions. Instead, real experience data on traffic is available and could be exploited at minimal costs. Recent progress in offline or batch RL has enabled just that. Model-based offline RL methods, in particular, have been shown to generalize from the experience data much better than others. We build a model-based learning framework which infers a Markov Decision Process (MDP) from a dataset collected using a cyclic traffic signal control policy that is both commonplace and easy to gather. The MDP is built with pessimistic costs to manage out-of-distribution scenarios using an adaptive shaping of rewards which is shown to provide better regularization compared to the prior related work in addition to being PAC-optimal. Our model is evaluated on a complex signalized roundabout showing that it is possible to build highly performant traffic control policies in a data efficient manner.

下载PDF全文

下载文献需遵守相关版权规定

论文标题