论文标题

具有高社会福利的NASH平衡的规范学习

Specification-Guided Learning of Nash Equilibria with High Social Welfare

论文作者

Jothimurugan, Kishor, Bansal, Suguman, Bastani, Osbert, Alur, Rajeev

论文摘要

增强学习已被证明是自动培训挑战控制问题的政策的有效策略。为了关注非合作多代理系统,我们为培训联合政策提供了一种新颖的增强学习框架,以形成NASH平衡。在我们的方法中,用户没有提供低级奖励功能,而是提供高级规格来编码每个代理的目标。然后,在规范结构的指导下,我们的算法搜索策略,以识别一种可以证明形成$ε$ -NASH平衡的算法(具有很高的概率)。重要的是,它以最大化所有代理商的社会福利的方式优先考虑政策。我们的经验评估表明,我们的算法计算具有较高社会福利的均衡政策,而最先进的基线无法计算Nash Equilibria或具有相对较低的社会福利的计算。

Reinforcement learning has been shown to be an effective strategy for automatically training policies for challenging control problems. Focusing on non-cooperative multi-agent systems, we propose a novel reinforcement learning framework for training joint policies that form a Nash equilibrium. In our approach, rather than providing low-level reward functions, the user provides high-level specifications that encode the objective of each agent. Then, guided by the structure of the specifications, our algorithm searches over policies to identify one that provably forms an $ε$-Nash equilibrium (with high probability). Importantly, it prioritizes policies in a way that maximizes social welfare across all agents. Our empirical evaluation demonstrates that our algorithm computes equilibrium policies with high social welfare, whereas state-of-the-art baselines either fail to compute Nash equilibria or compute ones with comparatively lower social welfare.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源