论文标题
池:信息素启发的通信框架福克斯量表多代理增强学习
PooL: Pheromone-inspired Communication Framework forLarge Scale Multi-Agent Reinforcement Learning
论文作者
论文摘要
难以扩展在多代理协调中构成了很大的问题。在小规模的多代理系统中应用的多机构增强学习(MARL)算法很难扩展到大规模的系统,因为后者更具动态性,并且相互作用的数量随着代理数量的增长而成倍增加。一些群智能算法模拟了信息素的释放和利用机制,以控制大规模的代理协调。受这些算法的启发,\ textbf {pool},一个\ textbf {p}她的\ textbf {o} m \ textbf {o}基于大型多尺度强化\ textbf {l textbf {l}的基于大规模的间接通信框架是为了解决大型多型跨度的问题。池代理发布的信息素定义为大多数增强学习算法的输出,这些输出反映了代理对当前环境的看法。信息素更新机制可以有效地组织所有试剂的信息,并将代理之间的复杂相互作用简化为低维表示。代理人认为的信息素可以被视为附近代理商的观点的摘要,可以更好地反映环境的真实情况。 Q学习是我们在各种大规模合作环境中评估池的基础模型。实验表明,代理人可以通过池捕获有效的信息,并获得比其他沟通成本较低的艺术最新方法更高的奖励。
Being difficult to scale poses great problems in multi-agent coordination. Multi-agent Reinforcement Learning (MARL) algorithms applied in small-scale multi-agent systems are hard to extend to large-scale ones because the latter is far more dynamic and the number of interactions increases exponentially with the growing number of agents. Some swarm intelligence algorithms simulate the release and utilization mechanism of pheromones to control large-scale agent coordination. Inspired by such algorithms, \textbf{PooL}, an \textbf{p}her\textbf{o}m\textbf{o}ne-based indirect communication framework applied to large scale multi-agent reinforcement \textbf{l}earning is proposed in order to solve the large-scale multi-agent coordination problem. Pheromones released by agents of PooL are defined as outputs of most reinforcement learning algorithms, which reflect agents' views of the current environment. The pheromone update mechanism can efficiently organize the information of all agents and simplify the complex interactions among agents into low-dimensional representations. Pheromones perceived by agents can be regarded as a summary of the views of nearby agents which can better reflect the real situation of the environment. Q-Learning is taken as our base model to implement PooL and PooL is evaluated in various large-scale cooperative environments. Experiments show agents can capture effective information through PooL and achieve higher rewards than other state-of-arts methods with lower communication costs.