论文标题

Lamarckian平台:将进化增强学习的界限推向异步商业游戏

Lamarckian Platform: Pushing the Boundaries of Evolutionary Reinforcement Learning towards Asynchronous Commercial Games

论文作者

Bai, Hui, Shen, Ruimin, Lin, Yue, Xu, Botian, Cheng, Ran

论文摘要

尽管将进化计算整合到增强学习中的新进展,但缺乏高性能平台可赋予合成性和大规模的并行性,这对与异步商业游戏相关的研究和应用造成了非平凡的困难。在这里,我们介绍了Lamarckian-一个开源平台,该平台为分布式计算资源提供了用于进化增强学习的支持。为了提高训练速度和数据效率,拉马克(Lamarckian)采用了优化的通信方法和异步进化增强学习工作流程。为了满足商业游戏和各种方法对异步界面的需求,Lamarckian量身定制了异步的马尔可夫决策过程界面,并设计了具有脱钩模块的面向对象的软件体系结构。与最先进的rllib相比,我们从经验上证明了Lamarckian在基准测试中具有多达6000个CPU核心的独特优势:i)在Google Football Game上运行PPO时,采样效率和训练速度都翻了一番; ii)在乒乓球比赛中运行PBT+PPO时,训练速度的速度快13倍。此外,我们还提出了两种用例:i)如何将拉马克安应用于生成行为多样性游戏AI; ii)如何将Lamarckian应用于异步商业游戏的游戏平衡测试。

Despite the emerging progress of integrating evolutionary computation into reinforcement learning, the absence of a high-performance platform endowing composability and massive parallelism causes non-trivial difficulties for research and applications related to asynchronous commercial games. Here we introduce Lamarckian - an open-source platform featuring support for evolutionary reinforcement learning scalable to distributed computing resources. To improve the training speed and data efficiency, Lamarckian adopts optimized communication methods and an asynchronous evolutionary reinforcement learning workflow. To meet the demand for an asynchronous interface by commercial games and various methods, Lamarckian tailors an asynchronous Markov Decision Process interface and designs an object-oriented software architecture with decoupled modules. In comparison with the state-of-the-art RLlib, we empirically demonstrate the unique advantages of Lamarckian on benchmark tests with up to 6000 CPU cores: i) both the sampling efficiency and training speed are doubled when running PPO on Google football game; ii) the training speed is 13 times faster when running PBT+PPO on Pong game. Moreover, we also present two use cases: i) how Lamarckian is applied to generating behavior-diverse game AI; ii) how Lamarckian is applied to game balancing tests for an asynchronous commercial game.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源