论文标题
翻转安全游戏的深入加强学习
Deep Reinforcement Learning for FlipIt Security Game
论文作者
论文摘要
强化学习在国际象棋,巴蒙和GO等游戏中取得了很大的成功。但是,在大多数游戏中,代理商始终都对环境有充分的了解。在本文中,我们描述了一个深度学习模型,其中代理成功地适应了不同类别的对手,并在局部可观察性下使用游戏中的强化学习来学习最佳的反策略。我们将模型应用于Flipit,这是一个两人的安全游戏,在其中,攻击者和后卫都在其中竞争共享资源的所有权,并且在采取行动时仅接收有关游戏当前状态的信息。我们的模型是一个深层神经网络,结合了Q学习,并接受了训练,以最大限度地利用辩护人拥有资源的所有权时间。尽管有嘈杂的信息,但我们的模型还是成功地学习了一种具有成本效益的反策略,超过了对手的策略,并显示了在游戏理论场景中使用深度强化学习的优势。我们还通过引入新的低成本移动,将模型推广到$ n $ - 玩家翻转,将翻转扩展到更大的动作间隔游戏。
Reinforcement learning has shown much success in games such as chess, backgammon and Go. However, in most of these games, agents have full knowledge of the environment at all times. In this paper, we describe a deep learning model in which agents successfully adapt to different classes of opponents and learn the optimal counter-strategy using reinforcement learning in a game under partial observability. We apply our model to FlipIt, a two-player security game in which both players, the attacker and the defender, compete for ownership of a shared resource and only receive information on the current state of the game upon making a move. Our model is a deep neural network combined with Q-learning and is trained to maximize the defender's time of ownership of the resource. Despite the noisy information, our model successfully learns a cost-effective counter-strategy outperforming its opponent's strategies and shows the advantages of the use of deep reinforcement learning in game theoretic scenarios. We also extend FlipIt to a larger action-spaced game with the introduction of a new lower-cost move and generalize the model to $n$-player FlipIt.