对深度强化学习的部分可观察性进行代理建模

论文标题

对深度强化学习的部分可观察性进行代理建模

Agent Modelling under Partial Observability for Deep Reinforcement Learning

论文作者

Papoudakis, Georgios, Christianos, Filippos, Albrecht, Stefano V.

论文摘要

对其他代理的行为进行建模对于了解代理如何互动和做出有效决策至关重要。代理建模的现有方法通常假设对执行过程中建模代理的局部观察结果和所选动作知识。为了消除这一假设，我们使用Encoder-Decoder架构从受控代理的本地信息中提取表示。使用训练期间建模药物的观察和动作，我们的模型学会提取有关模型代理仅以受控剂的局部观测来提取的表示。该表示的用户用于增强受控者的决策政策，该决策政策是通过深入的强化学习训练的；因此，在执行过程中，该策略不需要访问其他代理商的信息。我们在合作，竞争性和混合的多区域环境中提供了全面的评估和消融研究，这表明我们的方法比不使用学习表现的基线方法获得了更高的回报。

Modelling the behaviours of other agents is essential for understanding how agents interact and making effective decisions. Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution. To eliminate this assumption, we extract representations from the local information of the controlled agent using encoder-decoder architectures. Using the observations and actions of the modelled agents during training, our models learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent. The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning; thus, during execution, the policy does not require access to other agents' information. We provide a comprehensive evaluation and ablations studies in cooperative, competitive and mixed multi-agent environments, showing that our method achieves higher returns than baseline methods which do not use the learned representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题