量化非平稳性在强化学习基于学习的交通信号控制中的影响

论文标题

量化非平稳性在强化学习基于学习的交通信号控制中的影响

Quantifying the Impact of Non-Stationarity in Reinforcement Learning-Based Traffic Signal Control

论文作者

Alegre, Lucas N., Bazzan, Ana L. C., da Silva, Bruno C.

论文摘要

在加强学习（RL）中，处理非平稳性是一个具有挑战性的问题。但是，某些域（例如流量优化）本质上是非平稳的。原因和影响是多种多样的。特别是，在处理流量信号控件时，解决非平稳性是关键，因为交通状况会随着时间的流逝而变化，并且是网络其他部分所做的交通控制决策的函数。在本文中，我们分析了非平稳性来源在流量信号网络中具有的影响，在该网络中，每个信号都以学习代理为基础。更确切地说，我们研究了改变代理商学习的\ textit {context}的效果（例如，其所经历的流量变化），以及减少真实环境状态的可观察到剂的效果。部分可观察性可能会导致不同的状态（其中不同的动作是最佳的）被交通信号剂视为相同。反过来，这可能会导致次优性能。我们表明，缺乏合适的传感器来提供对真实状态的代表性观察似乎比对基础交通模式的变化更为严重。

In reinforcement learning (RL), dealing with non-stationarity is a challenging issue. However, some domains such as traffic optimization are inherently non-stationary. Causes for and effects of this are manifold. In particular, when dealing with traffic signal controls, addressing non-stationarity is key since traffic conditions change over time and as a function of traffic control decisions taken in other parts of a network. In this paper we analyze the effects that different sources of non-stationarity have in a network of traffic signals, in which each signal is modeled as a learning agent. More precisely, we study both the effects of changing the \textit{context} in which an agent learns (e.g., a change in flow rates experienced by it), as well as the effects of reducing agent observability of the true environment state. Partial observability may cause distinct states (in which distinct actions are optimal) to be seen as the same by the traffic signal agents. This, in turn, may lead to sub-optimal performance. We show that the lack of suitable sensors to provide a representative observation of the real state seems to affect the performance more drastically than the changes to the underlying traffic patterns.

下载PDF全文

下载文献需遵守相关版权规定

论文标题