具有高社会福利的NASH平衡的规范学习

论文标题

具有高社会福利的NASH平衡的规范学习

Specification-Guided Learning of Nash Equilibria with High Social Welfare

论文作者

Jothimurugan, Kishor, Bansal, Suguman, Bastani, Osbert, Alur, Rajeev

论文摘要

增强学习已被证明是自动培训挑战控制问题的政策的有效策略。为了关注非合作多代理系统，我们为培训联合政策提供了一种新颖的增强学习框架，以形成NASH平衡。在我们的方法中，用户没有提供低级奖励功能，而是提供高级规格来编码每个代理的目标。然后，在规范结构的指导下，我们的算法搜索策略，以识别一种可以证明形成$ε$ -NASH平衡的算法（具有很高的概率）。重要的是，它以最大化所有代理商的社会福利的方式优先考虑政策。我们的经验评估表明，我们的算法计算具有较高社会福利的均衡政策，而最先进的基线无法计算Nash Equilibria或具有相对较低的社会福利的计算。

Reinforcement learning has been shown to be an effective strategy for automatically training policies for challenging control problems. Focusing on non-cooperative multi-agent systems, we propose a novel reinforcement learning framework for training joint policies that form a Nash equilibrium. In our approach, rather than providing low-level reward functions, the user provides high-level specifications that encode the objective of each agent. Then, guided by the structure of the specifications, our algorithm searches over policies to identify one that provably forms an $ε$-Nash equilibrium (with high probability). Importantly, it prioritizes policies in a way that maximizes social welfare across all agents. Our empirical evaluation demonstrates that our algorithm computes equilibrium policies with high social welfare, whereas state-of-the-art baselines either fail to compute Nash equilibria or compute ones with comparatively lower social welfare.

下载PDF全文

下载文献需遵守相关版权规定

论文标题