通过无模型的多力增强学习来掌握Stratego的游戏

论文标题

通过无模型的多力增强学习来掌握Stratego的游戏

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

论文作者

Perolat, Julien, de Vylder, Bart, Hennes, Daniel, Tarassov, Eugene, Strub, Florian, de Boer, Vincent, Muller, Paul, Connor, Jerome T., Burch, Neil, Anthony, Thomas, McAleer, Stephen, Elie, Romuald, Cen, Sarah H., Wang, Zhe, Gruslys, Audrunas, Malysheva, Aleksandra, Khan, Mina, Ozair, Sherjil, Timbers, Finbarr, Pohlen, Toby, Eccles, Tom, Rowland, Mark, Lanctot, Marc, Lespiau, Jean-Baptiste, Piot, Bilal, Omidshafiei, Shayegan, Lockhart, Edward, Sifre, Laurent, Beauguerlange, Nathalie, Munos, Remi, Silver, David, Singh, Satinder, Hassabis, Demis, Tuyls, Karl

论文摘要

我们介绍了Deepnash，这是一种能够学习从头开始播放不完美的信息游戏策略的自主代理，直到人类的专家级别。 Stratego是人工智能（AI）尚未掌握的少数标志性棋盘游戏之一。这个受欢迎的游戏具有$ 10^{535} $节点的巨大游戏树，即，$ 10^{175} $乘以GO的倍。它具有在不完美的信息下需要决策的其他复杂性，类似于德克萨斯州Hold'em扑克，该扑克的游戏树较小（以$ 10^{164} $节点为单位）。 Stratego中的决策是在许多离散的动作上做出的，而动作与结果之间没有明显的联系。情节很长，在球员获胜之前经常有数百次动作，而Stratego中的情况则不能像扑克中那样轻松地将其分解成管理大小的子问题。由于这些原因，Stratego几十年来一直是AI领域的巨大挑战，现有的AI方法几乎没有达到业余比赛水平。 DeepNash使用游戏理论，无模型的深钢筋学习方法，而无需搜索，该方法学会通过自我播放来掌握Stratego。 DeepNash的关键组成部分的正则化NASH Dynamics（R-NAD）算法会通过直接修改基本的多学院学习动力学来收敛到近似NASH平衡，而不是围绕其“骑自行车”。 Deepnash在Stratego中击败了现有的最先进的AI方法，并在Gravon Games平台上获得了年度（2022年）和历史前3名，并与人类专家竞争。

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.

下载PDF全文

下载文献需遵守相关版权规定

论文标题