论文标题
与看不见的合作伙伴一起评估哈纳比的彩虹DQN代理
Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners
论文作者
论文摘要
哈纳比(Hanabi)是一款合作游戏,由于其专注于建模其他玩家的心理状态以解释和预测其行为,因此挑战了AI技术的挑战。尽管在游戏中可以在某些共同策略中获得接近完美的分数的代理人,但在临时合作环境中取得了相对较少的进步,在这种情况下,合作伙伴和策略未提前知道。在本文中,我们使用流行的RainbowDQN体系结构来彰显通过自我播放训练的代理商,无法与简单的基于规则的机构合作,而这些基于规则的代理商在训练过程中没有看到,相反,经过培训的代理人经过培训可以与任何基于规则的机构一起玩,甚至是这些代理的混合物,他们都无法取得良好的播放得分。
Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior. While thereare agents that can achieve near-perfect scores in the game byagreeing on some shared strategy, comparatively little progresshas been made in ad-hoc cooperation settings, where partnersand strategies are not known in advance. In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training and, conversely, whenthese agents are trained to play with any individual rule-basedagent, or even a mix of these agents, they fail to achieve goodself-play scores.