可扩展的多代理逆增强学习通过演员注意力批评

论文标题

可扩展的多代理逆增强学习通过演员注意力批评

Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic

论文作者

Jeon, Wonseok, Barde, Paul, Nowrouzezahrai, Derek, Pineau, Joelle

论文摘要

多机构对抗性逆增强学习（MA-AIRL）是一种最近的方法，它将单人AIRL应用于多机构问题，我们试图为我们的代理人恢复这两种政策，并奖励促进专家行为的奖励功能。尽管Ma-Airl在合作和竞争任务上具有有希望的结果，但它具有样本感知，并且仅在少量代理商的经验上得到了验证 - 其扩展到许多代理商的能力仍然是一个悬而未决的问题。我们提出了一种比以前的工作更有效和可扩展的多代理逆RL算法。具体而言，我们采用多代理参与者 - 注意批判性（MAAC） - 一个非政策多代理RL（MARL）方法 - 用于逆RL过程的RL内环。通过这样做，与最先进的基线相比，我们能够提高样品效率，而小型和大型任务均可提高样品效率。此外，与我们的方法所恢复的奖励训练的RL代理相比，在基本线获得的奖励方面，我们的方法与专家更好。最后，我们的方法需要更少的代理环境相互作用，尤其是随着代理数量的增加。

Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems where we seek to recover both policies for our agents and reward functions that promote expert-like behavior. While MA-AIRL has promising results on cooperative and competitive tasks, it is sample-inefficient and has only been validated empirically for small numbers of agents -- its ability to scale to many agents remains an open question. We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works. Specifically, we employ multi-agent actor-attention-critic (MAAC) -- an off-policy multi-agent RL (MARL) method -- for the RL inner loop of the inverse RL procedure. In doing so, we are able to increase sample efficiency compared to state-of-the-art baselines, across both small- and large-scale tasks. Moreover, the RL agents trained on the rewards recovered by our method better match the experts than those trained on the rewards derived from the baselines. Finally, our method requires far fewer agent-environment interactions, particularly as the number of agents increases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题