论文标题
通过自适应激励设计在游戏中诱导社会最优性
Inducing Social Optimality in Games via Adaptive Incentive Design
论文作者
论文摘要
社会计划者如何适应激励自私的代理人,他们在战略环境中学习以诱导社会最佳结果,从长远来看?我们提出了两次尺度的学习动力,以在原子和非原子游戏中回答这个问题。在我们的学习动力学中,玩家会采用一类学习规则,以更快的时间范围更新其策略,而社会计划者则更新了较慢的时间表的激励机制。特别是,激励机制的更新基于每个玩家的外部性,这被评估为玩家的边际成本与社会在每个时间步骤中的边际成本之间的差异。我们表明,学习动力学的任何固定点都对应于最佳激励机制,以使相应的NASH平衡也可以实现社会最佳性。我们还提供了足够的条件,使学习动力学能够收敛到固定点,从而使自适应激励机制最终引起社会最佳的结果。最后,我们证明了在各种游戏中满足了足够的收敛条件,包括(i)原子网络二次聚合游戏,(ii)原子幼崽竞赛和(iii)非原子网络路由游戏。
How can a social planner adaptively incentivize selfish agents who are learning in a strategic environment to induce a socially optimal outcome in the long run? We propose a two-timescale learning dynamics to answer this question in both atomic and non-atomic games. In our learning dynamics, players adopt a class of learning rules to update their strategies at a faster timescale, while a social planner updates the incentive mechanism at a slower timescale. In particular, the update of the incentive mechanism is based on each player's externality, which is evaluated as the difference between the player's marginal cost and the society's marginal cost in each time step. We show that any fixed point of our learning dynamics corresponds to the optimal incentive mechanism such that the corresponding Nash equilibrium also achieves social optimality. We also provide sufficient conditions for the learning dynamics to converge to a fixed point so that the adaptive incentive mechanism eventually induces a socially optimal outcome. Finally, we demonstrate that the sufficient conditions for convergence are satisfied in a variety of games, including (i) atomic networked quadratic aggregative games, (ii) atomic Cournot competition, and (iii) non-atomic network routing games.