论文标题
重新访问多代理深入学习中的参数共享
Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning
论文作者
论文摘要
参数共享,每个代理都独立学习所有政策之间具有完全共享参数的策略,是多代理深入强化学习的流行基线方法。不幸的是,由于所有代理人共享相同的策略网络,因此他们无法学习不同的政策或任务。通过将特定于特定的指标信号添加到观测值中,我们将此问题概括为实验,我们将其称为“代理指示”。但是,代理指示是有限的,但是,如果没有修改,则不允许将参数共享应用于动作空间和/或观察空间是异质的环境。这项工作正式化了代理指示的概念,并证明了它可以首次融合到最佳政策。接下来,我们正式介绍将参数共享扩展到异质观察和动作空间中学习的方法,并证明这些方法允许收敛到最佳策略。最后,我们通过实验确认我们从经验中引入功能的方法,并进行了多种实验,研究了许多不同的试剂指征方案对基于图像的观察空间的经验效果。
Parameter sharing, where each agent independently learns a policy with fully shared parameters between all policies, is a popular baseline method for multi-agent deep reinforcement learning. Unfortunately, since all agents share the same policy network, they cannot learn different policies or tasks. This issue has been circumvented experimentally by adding an agent-specific indicator signal to observations, which we term "agent indication". Agent indication is limited, however, in that without modification it does not allow parameter sharing to be applied to environments where the action spaces and/or observation spaces are heterogeneous. This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time. Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimal policies. Finally, we experimentally confirm that the methods we introduce function empirically, and conduct a wide array of experiments studying the empirical efficacy of many different agent indication schemes for image based observation spaces.