Suphx：用深厚的增强学习掌握Mahjong

论文标题

Suphx：用深厚的增强学习掌握Mahjong

Suphx: Mastering Mahjong with Deep Reinforcement Learning

论文作者

Li, Junjie, Koyamada, Sotetsu, Ye, Qiwei, Liu, Guoqing, Wang, Chao, Yang, Ruihan, Zhao, Li, Qin, Tao, Liu, Tie-Yan, Hon, Hsiao-Wuen

论文摘要

人工智能（AI）在许多领域取得了巨大的成功，自AI黎明以来，AI被广泛认为是其海滩头。近年来，对AI的研究逐渐从相对简单的环境（例如，GO，Chess，Shogi或Twip-player Interfect Indrectife Inderfotation Games）逐渐发展为更复杂的游戏（例如，多人游戏不完美的信息），例如多型游戏，例如Texas Holders Holdect II）。 Mahjong是全球受欢迎的多游戏不完美信息游戏，但由于其复杂的演奏/评分规则和丰富的隐藏信息，对AI研究的挑战非常具有挑战性。我们为Mahjong设计了一个名为Suphx的AI，基于深入的强化学习，并采用了一些新引入的技术，包括全球奖励预测，Oracle指导和运行时政策适应。就稳定的排名而言，Suphx的性能比大多数顶级人类球员都表现出更强的表现，并且在Tenhou平台中所有正式排名的人类球员的99.99％以上。这是计算机程序第一次优于Mahjong的大多数顶级人类玩家。

Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI. In recent years, studies on game AI have gradually evolved from relatively simple environments (e.g., perfect-information games such as Go, chess, shogi or two-player imperfect-information games such as heads-up Texas hold'em) to more complex ones (e.g., multi-player imperfect-information games such as multi-player Texas hold'em and StartCraft II). Mahjong is a popular multi-player imperfect-information game worldwide but very challenging for AI research due to its complex playing/scoring rules and rich hidden information. We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques including global reward prediction, oracle guiding, and run-time policy adaptation. Suphx has demonstrated stronger performance than most top human players in terms of stable rank and is rated above 99.99% of all the officially ranked human players in the Tenhou platform. This is the first time that a computer program outperforms most top human players in Mahjong.

下载PDF全文

下载文献需遵守相关版权规定

论文标题