政策镜面上升，以实现平均野外游戏的高效和独立学习

论文标题

政策镜面上升，以实现平均野外游戏的高效和独立学习

Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games

论文作者

Yardim, Batuhan, Cayci, Semih, Geist, Matthieu, He, Niao

论文摘要

平均场游戏已被用作理论工具，以获得对称和匿名$ n $玩家游戏的近似NASH平衡。但是，限制适用性，现有的理论结果假设了“人群生成模型”的变化，该模型允许通过学习算法对种群分布进行任意修改。此外，学习算法通常在具有人口的抽象模拟器上起作用，而不是$ n $玩家游戏。取而代之的是，我们表明，运行策略镜的$ n $代理将$ \ widetilde {\ Mathcal {o}}（\ varepsilon^{ - 2}）$从单个示例轨迹中示例，而没有标准的型号，直至$ \ MATHCAL {O}（\ frac {1} {\ sqrt {n}}）$错误是由于平均字段而导致的。从文献中采用不同的方法，而不是使用最佳响应图，而是首先表明策略镜登地图可用于构建具有NASH平衡作为其固定点的承包操作员。我们分析了$ n $ agent Games的单路径TD学习，证明了仅使用$ n $ Agent Simulator的示例路径而没有人口生成模型来保证样本复杂性。此外，我们证明我们的方法可以通过有限样本保证的$ n $ agents进行独立学习。

Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on abstract simulators with population instead of the $N$-player game. Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field. Taking a divergent approach from the literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive operator having the Nash equilibrium as its fixed point. We analyze single-path TD learning for $N$-agent games, proving sample complexity guarantees by only using a sample path from the $N$-agent simulator without a population generative model. Furthermore, we demonstrate that our methodology allows for independent learning by $N$ agents with finite sample guarantees.

下载PDF全文

下载文献需遵守相关版权规定

论文标题