与图形神经网络通信层进行合作的自动课程多代理增强学习，用于开放式野火管理资源分布

论文标题

与图形神经网络通信层进行合作的自动课程多代理增强学习，用于开放式野火管理资源分布

Collaborative Auto-Curricula Multi-Agent Reinforcement Learning with Graph Neural Network Communication Layer for Open-ended Wildfire-Management Resource Distribution

论文作者

Siedler, Philipp Dominic

论文摘要

大多数现实世界域可以被配制为多代理（MA）系统。有意共享代理可以通过在更少的时间内协作来解决更复杂的任务。出于利己主义和集体原因，真正的合作行动是有益的。但是，教导个人代理人牺牲利己主义的利益以获得更好的集体表现似乎具有挑战性。我们建立在最近提出的具有图神经网络（GNN）通信层的多代理增强学习（MARL）机制的基础上。很少选择的沟通行动略有利益。在这里，我们提出了一个MARL系统，在该系统中，代理可以帮助合作者表现更好，同时冒着降低个人绩效的风险。我们在野火管理资源分配的背景下进行研究。传达环境特征和部分可观察到的火灾发生，有助于代理集体进行抢先分配资源。此外，我们引入了一个程序性培训环境，该环境适合自动课程和开放性，以更好地概括。我们的MA沟通建议优于贪婪的启发式基线和单一代理（SA）设置。我们进一步展示了自动疗法和开放性如何提高我们的MA提案的普遍性。

Most real-world domains can be formulated as multi-agent (MA) systems. Intentionality sharing agents can solve more complex tasks by collaborating, possibly in less time. True cooperative actions are beneficial for egoistic and collective reasons. However, teaching individual agents to sacrifice egoistic benefits for a better collective performance seems challenging. We build on a recently proposed Multi-Agent Reinforcement Learning (MARL) mechanism with a Graph Neural Network (GNN) communication layer. Rarely chosen communication actions were marginally beneficial. Here we propose a MARL system in which agents can help collaborators perform better while risking low individual performance. We conduct our study in the context of resource distribution for wildfire management. Communicating environmental features and partially observable fire occurrence help the agent collective to pre-emptively distribute resources. Furthermore, we introduce a procedural training environment accommodating auto-curricula and open-endedness towards better generalizability. Our MA communication proposal outperforms a Greedy Heuristic Baseline and a Single-Agent (SA) setup. We further demonstrate how auto-curricula and openendedness improves generalizability of our MA proposal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题