HSVI可以解决零和部分可观察到的随机游戏

论文标题

HSVI可以解决零和部分可观察到的随机游戏

HSVI can solve zero-sum Partially Observable Stochastic Games

论文作者

Delage, Aurélien, Buffet, Olivier, Dibangoye, Jilles S., Saffidine, Abdallah

论文摘要

解决2播放器零和不完美的信息游戏的最新方法依赖于线性编程或遗憾最小化，尽管不是动态编程（DP）或启发式搜索（HS），而后者通常是针对其他顺序决策问题的最先进的解决方案的核心。在部分可观察或协作的设置（例如POMDP和DEC-POMDP）中，DP和HS需要引入适当的统计量，该统计量诱发了最佳值函数的完全可观察到的问题以及界限（凸）近似值。这种方法在2个玩家零和部分可观察到的随机游戏（ZS-POSG）的某些子类中也取得了成功，但是如何将其应用于一般情况仍然仍然是一个悬而未决的问题。 We answer it by (i) rigorously defining an equivalent game to work with, (ii) proving mathematical properties of the optimal value function that allow deriving bounds that come with solution strategies, (iii) proposing for the first time an HSVI-like solver that provably converges to an $ε$-optimal solution in finite time, and (iv) empirically analyzing it.这为一种新颖的有前途的方法打开了依靠线性编程或迭代方法的新手家族。

State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on linear programming or regret minimization, though not on dynamic programming (DP) or heuristic search (HS), while the latter are often at the core of state-of-the-art solvers for other sequential decision-making problems. In partially observable or collaborative settings (e.g., POMDPs and Dec- POMDPs), DP and HS require introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs- POSGs) as well, but how to apply it in the general case still remains an open question. We answer it by (i) rigorously defining an equivalent game to work with, (ii) proving mathematical properties of the optimal value function that allow deriving bounds that come with solution strategies, (iii) proposing for the first time an HSVI-like solver that provably converges to an $ε$-optimal solution in finite time, and (iv) empirically analyzing it. This opens the door to a novel family of promising approaches complementing those relying on linear programming or iterative methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题