旨在了解合作的多代理Q学习，而价值分解

论文标题

旨在了解合作的多代理Q学习，而价值分解

Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization

论文作者

Wang, Jianhao, Ren, Zhizhou, Han, Beining, Ye, Jianing, Zhang, Chongjie

论文摘要

价值分解是一种流行而有希望的方法，可以在合作环境中扩展多代理增强学习，它可以平衡学习可扩展性和价值功能的表示能力。但是，对此类方法的理论理解是有限的。在本文中，我们为分析分解的多代理Q学习的多代理拟合Q材料框架形式化。基于此框架，我们研究了线性价值分解，并揭示了此简单分解的多代理Q学习，隐含地实现了强大的反事实信用分配，但可能不会在某些设置中融合。通过进一步的分析，我们发现，在政上培训或更丰富的联合价值函数类别可以分别改善其本地或全球收敛性能。最后，为了支持我们在实际实现中的理论含义，我们对最新的深层多代理Q学习算法进行了实证分析，并在教学示例和一组广泛的Starcraft II单位微管理任务上进行了经验分析。

Value factorization is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings, which balances the learning scalability and the representational capacity of value functions. However, the theoretical understanding of such methods is limited. In this paper, we formalize a multi-agent fitted Q-iteration framework for analyzing factorized multi-agent Q-learning. Based on this framework, we investigate linear value factorization and reveal that multi-agent Q-learning with this simple decomposition implicitly realizes a powerful counterfactual credit assignment, but may not converge in some settings. Through further analysis, we find that on-policy training or richer joint value function classes can improve its local or global convergence properties, respectively. Finally, to support our theoretical implications in practical realization, we conduct an empirical analysis of state-of-the-art deep multi-agent Q-learning algorithms on didactic examples and a broad set of StarCraft II unit micromanagement tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题