Perfectdou：以完美的信息蒸馏来统治Doudizhu

论文标题

Perfectdou：以完美的信息蒸馏来统治Doudizhu

PerfectDou: Dominating DouDizhu with Perfect Information Distillation

论文作者

Yang, Guan, Liu, Minghuan, Hong, Weijun, Zhang, Weinan, Fang, Fei, Zeng, Guangjun, Lin, Yue

论文摘要

作为一款具有挑战性的多人纸牌游戏，Doudizhu最近引起了分析不完美信息游戏的竞争和协作的广泛关注。在本文中，我们提出了Perfectdou，这是一种最先进的Doudizhu AI系统，在演员批评的框架中使用了一种名为Perfect Information Information蒸馏的技术框架。详细说明，我们采用了一个完美的训练 - 强制执行框架，该框架使代理商可以利用全球信息来指导政策的培训，就好像它是一个完美的信息游戏一样，并且可以使用训练有素的政策来玩实际游戏期间的不完美信息游戏。为此，我们表征了Doudizhu的卡和游戏功能，以表示完美和不完美的信息。为了训练我们的系统，我们在平行培训范式中采用近端政策优化，并具有广义优势估计。在实验中，我们展示了Perfectdou如何以及为什么击败所有现有的AI程序并实现最先进的性能。

As a challenging multi-player card game, DouDizhu has recently drawn much attention for analyzing competition and collaboration in imperfect-information games. In this paper, we propose PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game, in an actor-critic framework with a proposed technique named perfect information distillation. In detail, we adopt a perfect-training-imperfect-execution framework that allows the agents to utilize the global information to guide the training of the policies as if it is a perfect information game and the trained policies can be used to play the imperfect information game during the actual gameplay. To this end, we characterize card and game features for DouDizhu to represent the perfect and imperfect information. To train our system, we adopt proximal policy optimization with generalized advantage estimation in a parallel training paradigm. In experiments we show how and why PerfectDou beats all existing AI programs, and achieves state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题