温暖启动的alphazero自我播放搜索增强功能

论文标题

温暖启动的alphazero自我播放搜索增强功能

Warm-Start AlphaZero Self-Play Search Enhancements

论文作者

Wang, Hui, Preuss, Mike, Plaat, Aske

论文摘要

最近，Alphazero通过提供了一种在超级人类层面学习三场不同游戏的自我播放架构，从而实现了具有里程碑意义的结果。 Alphazero是一个具有许多参数的大型且复杂的系统，成功需要大量的计算功率和微调。在其他游戏中重现结果是一个挑战，许多研究人员正在寻找改善结果的方法，同时减少计算需求。 Alphazero的设计纯粹是基于自我游戏，不使用标记的专家数据Ordomain特定的增强。它旨在从头开始学习。我们提出了一种新的方法，通过在自我播放训练的开始阶段采用简单的搜索增强，即推出，快速行动价值估计（RAVE）以及这些与神经网络的动态加权组合以及滚动地平线进化算法（RHEA）。我们的实验表明，这些增强大多数在三个不同（小）棋盘游戏中都提高了基线玩家的性能，特别是基于Rave的变体的表现强劲。

Recently, AlphaZero has achieved landmark results in deep reinforcement learning, by providing a single self-play architecture that learned three different games at super human level. AlphaZero is a large and complicated system with many parameters, and success requires much compute power and fine-tuning. Reproducing results in other games is a challenge, and many researchers are looking for ways to improve results while reducing computational demands. AlphaZero's design is purely based on self-play and makes no use of labeled expert data ordomain specific enhancements; it is designed to learn from scratch. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games, with especially RAVE based variants playing strongly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题