论文标题

政策镜面上升,以实现平均野外游戏的高效和独立学习

Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games

论文作者

Yardim, Batuhan, Cayci, Semih, Geist, Matthieu, He, Niao

论文摘要

平均场游戏已被用作理论工具,以获得对称和匿名$ n $玩家游戏的近似NASH平衡。但是,限制适用性,现有的理论结果假设了“人群生成模型”的变化,该模型允许通过学习算法对种群分布进行任意修改。此外,学习算法通常在具有人口的抽象模拟器上起作用,而不是$ n $玩家游戏。取而代之的是,我们表明,运行策略镜的$ n $代理将$ \ widetilde {\ Mathcal {o}}(\ varepsilon^{ - 2})$从单个示例轨迹中示例,而没有标准的型号,直至$ \ MATHCAL {O}(\ frac {1} {\ sqrt {n}})$错误是由于平均字段而导致的。从文献中采用不同的方法,而不是使用最佳响应图,而是首先表明策略镜登地图可用于构建具有NASH平衡作为其固定点的承包操作员。我们分析了$ n $ agent Games的单路径TD学习,证明了仅使用$ n $ Agent Simulator的示例路径而没有人口生成模型来保证样本复杂性。此外,我们证明我们的方法可以通过有限样本保证的$ n $ agents进行独立学习。

Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on abstract simulators with population instead of the $N$-player game. Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field. Taking a divergent approach from the literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive operator having the Nash equilibrium as its fixed point. We analyze single-path TD learning for $N$-agent games, proving sample complexity guarantees by only using a sample path from the $N$-agent simulator without a population generative model. Furthermore, we demonstrate that our methodology allows for independent learning by $N$ agents with finite sample guarantees.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源