通过使用状态丧失的多代理强化学习，通过多方面的增强学习在顺序社会困境中引起合作行为

论文标题

通过使用状态丧失的多代理强化学习，通过多方面的增强学习在顺序社会困境中引起合作行为

Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

论文作者

Badjatiya, Pinkesh, Sarkar, Mausoom, Sinha, Abhishek, Singh, Siddharth, Puri, Nikaash, Subramanian, Jayakumar, Krishnamurthy, Balaji

论文摘要

在社会困境情况下，个人理性导致了次级最佳的群体成果。可以将几种人类参与建模为一个连续的（多步）社会困境。但是，与人类相反，经过培训的深入加强学习者，以优化连续的社会困境中的个人奖励，融合了自私的，相互危害的行为。我们引入了一个现状损失（SQLOSS），该损失鼓励代理商坚持现状，而不是反复改变其政策。我们展示了经过SQLOSS培训的代理商如何在几个社交困境矩阵游戏中演变合作行为。要与具有视觉投入的社交困境游戏合作，我们提出了GameStill。 Gamedistill使用自学和聚类来自动从社交困境游戏中提取合作和自私的政策。我们将GameStill和Sqloss结合在一起，以展示代理商如何在硬币游戏中发展社会期望的合作行为。

In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss) that encourages an agent to stick to the status quo, rather than repeatedly changing its policy. We show how agents trained with SQLoss evolve cooperative behavior in several social dilemma matrix games. To work with social dilemma games that have visual input, we propose GameDistill. GameDistill uses self-supervision and clustering to automatically extract cooperative and selfish policies from a social dilemma game. We combine GameDistill and SQLoss to show how agents evolve socially desirable cooperative behavior in the Coin Game.

下载PDF全文

下载文献需遵守相关版权规定

论文标题