论文标题
在不确定的环境下,平均实地游戏和分散的智能自适应追求逃避逃避策略
Mean Field Game and Decentralized Intelligent Adaptive Pursuit Evasion Strategy for Massive Multi-Agent System under Uncertain Environment
论文作者
论文摘要
在本文中,已经制定了一种新型的分散智能自适应最佳策略,以解决不确定环境下大规模多机构系统(MAS)的追求逃避游戏。由于臭名昭著的“维度的诅咒”和代理人人数较大,因此,对于大人来说,追求逃避游戏的现有策略既没有有效,也不是实用的。为了克服这些挑战,采用了新兴的平均野外游戏理论,并与强化学习进一步整合,以开发出一种新型的分散的智能自适应策略,并使用一种新型的自适应动态编程体系结构,名为Actor-Critic-Mass(ACM)。通过在线估算耦合平均场方程的解决方案,即使在不确定的环境下,开发的策略也可以获得最佳的追求逃避政策。在拟议的基于ACM学习的策略中,每个代理都维持五个神经网络,其中1)批评神经网络近似于每个人的HJI方程解决方案; 2)估计该组的种群密度函数(即质量)的质量神经网络; 3)近似分散的最佳策略的参与者神经网络,以及4)两个神经网络旨在估计对手的群体质量以及最佳成本函数。最终,提供了全面的数值模拟,以证明设计策略的有效性。
In this paper, a novel decentralized intelligent adaptive optimal strategy has been developed to solve the pursuit-evasion game for massive Multi-Agent Systems (MAS) under uncertain environment. Existing strategies for pursuit-evasion games are neither efficient nor practical for large population multi-agent system due to the notorious "Curse of dimensionality" and communication limit while the agent population is large. To overcome these challenges, the emerging mean field game theory is adopted and further integrated with reinforcement learning to develop a novel decentralized intelligent adaptive strategy with a new type of adaptive dynamic programing architecture named the Actor-Critic-Mass (ACM). Through online approximating the solution of the coupled mean field equations, the developed strategy can obtain the optimal pursuit-evasion policy even for massive MAS under uncertain environment. In the proposed ACM learning based strategy, each agent maintains five neural networks, which are 1) the critic neural network to approximate the solution of the HJI equation for each individual agent; 2) the mass neural network to estimate the population density function (i.e., mass) of the group; 3) the actor neural network to approximate the decentralized optimal strategy, and 4) two more neural networks are designed to estimate the opponents' group mass as well as the optimal cost function. Eventually, a comprehensive numerical simulation has been provided to demonstrate the effectiveness of the designed strategy.