与看不见的合作伙伴一起评估哈纳比的彩虹DQN代理

论文标题

与看不见的合作伙伴一起评估哈纳比的彩虹DQN代理

Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners

论文作者

Canaan, Rodrigo, Gao, Xianbo, Chung, Youjin, Togelius, Julian, Nealen, Andy, Menzel, Stefan

论文摘要

哈纳比（Hanabi）是一款合作游戏，由于其专注于建模其他玩家的心理状态以解释和预测其行为，因此挑战了AI技术的挑战。尽管在游戏中可以在某些共同策略中获得接近完美的分数的代理人，但在临时合作环境中取得了相对较少的进步，在这种情况下，合作伙伴和策略未提前知道。在本文中，我们使用流行的RainbowDQN体系结构来彰显通过自我播放训练的代理商，无法与简单的基于规则的机构合作，而这些基于规则的代理商在训练过程中没有看到，相反，经过培训的代理人经过培训可以与任何基于规则的机构一起玩，甚至是这些代理的混合物，他们都无法取得良好的播放得分。

Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior. While thereare agents that can achieve near-perfect scores in the game byagreeing on some shared strategy, comparatively little progresshas been made in ad-hoc cooperation settings, where partnersand strategies are not known in advance. In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training and, conversely, whenthese agents are trained to play with any individual rule-basedagent, or even a mix of these agents, they fail to achieve goodself-play scores.

下载PDF全文

下载文献需遵守相关版权规定

论文标题