关于混合策略的不可能，不后悔学习

论文标题

关于混合策略的不可能，不后悔学习

On the Impossibility of Convergence of Mixed Strategies with No Regret Learning

论文作者

Muthukumar, Vidya, Phade, Soham, Sahai, Anant

论文摘要

我们研究了由最佳的无重格学习策略在重复的游戏设置中引起的混合策略的限制行为，在该游戏设置中，舞台游戏是2 x 2竞争性游戏。我们考虑了最佳的无重格算法，这些算法在其论点中是基于平均值且单调的。我们表明，对于任何这种算法，玩家的混合策略几乎肯定不能肯定地融合到任何NASH平衡。还表明，这种负面结果还可以在这些假设的广泛放松下，包括乐观和/或自适应阶梯尺寸的流行在线摩尔群落变体。最后，我们猜想可以去除单调性假设，并为此猜想提供部分证据。我们的结果将玩家实现的固有随机性确定为在使用对手的混合物和实现进行更新之间的结果中的关键因素。

We study the limiting behavior of the mixed strategies that result from optimal no-regret learning strategies in a repeated game setting where the stage game is any 2 by 2 competitive game. We consider optimal no-regret algorithms that are mean-based and monotonic in their argument. We show that for any such algorithm, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium. This negative result is also shown to hold under a broad relaxation of these assumptions, including popular variants of Online-Mirror-Descent with optimism and/or adaptive step-sizes. Finally, we conjecture that the monotonicity assumption can be removed, and provide partial evidence for this conjecture. Our results identify the inherent stochasticity in players' realizations as a critical factor underlying this divergence in outcomes between using the opponent's mixtures and realizations to make updates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题