关于《星际争霸II》全长游戏的有效增强学习

论文标题

关于《星际争霸II》全长游戏的有效增强学习

On Efficient Reinforcement Learning for Full-length Game of StarCraft II

论文作者

Liu, Ruo-Ze, Pang, Zhen-Jia, Meng, Zhou-Yu, Wang, Wenhai, Yu, Yang, Lu, Tong

论文摘要

Starcraft II（SC2）对增强学习（RL）构成了巨大的挑战，其中主要困难包括巨大的状态空间，不同的动作空间和长期的视野。在这项工作中，我们研究了《星际争霸II》全长游戏的一系列RL技术。我们研究了一种层次RL方法，涉及提取的宏观作用和神经网络的层次结构。我们研究了课程转移培训程序，并在具有4个GPU和48个CPU线的单台计算机上训练代理。在64x64地图上使用限制单元，我们对内置AI的胜率达到99％。通过课程转移学习算法和战斗模型的混合物，我们在最困难的非作战水平内置AI（级别7）中获得了93％的胜利率。在本文的扩展版本中，我们改进了架构，以针对作弊水平的AIS进行训练，并针对8级，9级和10级AIS实现胜利率，分别为96％，97％和94％。我们的代码在https://github.com/liuruoze/hiernet-sc2上。为了为我们的工作以及研究和开源社区提供基准，我们将其缩放为Mini-Alphastar（MAS）。 MAS的最新版本是1.07，可以在具有564个动作的原始动作空间上进行培训。它旨在通过使超参数可调节来在单个普通机器上进行训练。然后，我们使用相同的资源将我们的工作与MAS进行比较，并表明我们的方法更有效。迷你α的代码在https://github.com/liuruoze/mini-alphastar上。我们希望我们的研究能够阐明对SC2和其他大型游戏有效增强学习的未来研究。

StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks. We investigate a curriculum transfer training procedure and train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively. Our codes are at https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable. We then compare our work with mAS using the same resources and show that our method is more effective. The codes of mini-AlphaStar are at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题