投机对手模型的决策

论文标题

投机对手模型的决策

Decision-making with Speculative Opponent Models

论文作者

Sun, Jing, Chen, Shuo, Zhang, Cong, Ma, Yining, Zhang, Jie

论文摘要

事实证明，对手建模有效地通过构建对手代理的模型来增强受控药物的决策。但是，现有方法通常依赖于对对手的观察和行动的访问，当此类信息无法观察或挑战时，这一要求是不可行的。为了解决这个问题，我们介绍了分配对手辅助的多代理参与者 - 批评（DOMAC），这是第一个投机性对手建模算法，仅依赖于本地信息（即受控代理人的观察，动作和奖励）。具体而言，演员使用量身定制的投机对手模型对对手保持一定的信念，这些模型仅使用本地信息来预测对手的行动。此外，DOMAC具有分销评论家模型，这些模型估算了演员政策的回报分布，从而对演员的质量进行了更细粒度的评估。因此，这更有效地指导了演员所依赖的投机对手模型的训练。此外，我们通过提出的对手模型正式得出一个策略梯度定理。在MPE，Pommerman和Starcraft Multiagent Challenge（SMAC）内进行的八个不同挑战性的多代理基准任务的广泛实验表明，我们的DOMAC成功地模拟了对手的行为，并针对以更快的融合速度进行了针对最先进的方法。

Opponent modelling has proven effective in enhancing the decision-making of the controlled agent by constructing models of opponent agents. However, existing methods often rely on access to the observations and actions of opponents, a requirement that is infeasible when such information is either unobservable or challenging to obtain. To address this issue, we introduce Distributional Opponent-aided Multi-agent Actor-Critic (DOMAC), the first speculative opponent modelling algorithm that relies solely on local information (i.e., the controlled agent's observations, actions, and rewards). Specifically, the actor maintains a speculated belief about the opponents using the tailored speculative opponent models that predict the opponents' actions using only local information. Moreover, DOMAC features distributional critic models that estimate the return distribution of the actor's policy, yielding a more fine-grained assessment of the actor's quality. This thus more effectively guides the training of the speculative opponent models that the actor depends upon. Furthermore, we formally derive a policy gradient theorem with the proposed opponent models. Extensive experiments under eight different challenging multi-agent benchmark tasks within the MPE, Pommerman and StarCraft Multiagent Challenge (SMAC) demonstrate that our DOMAC successfully models opponents' behaviours and delivers superior performance against state-of-the-art methods with a faster convergence speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题