论文标题
多人匪徒的自私鲁棒性和平衡
Selfish Robustness and Equilibria in Multi-Player Bandits
论文作者
论文摘要
由认知无线电的动机,随机多人多军匪徒最近引起了很多兴趣。在这类问题中,如果其中一些人同时拉上同一手臂,则几名玩家同时拉动了双臂并遇到碰撞 - 0奖励。尽管大多数考虑了最大化集体奖励(遵循固定协议)的合作案例,但对恶意玩家的稳健性是一个至关重要且挑战性的关注点。现有方法仅考虑对逆境干扰者的情况,其目标是盲目地最大程度地减少集体奖励。相反,我们将考虑更自然的自私者阶层,其激励措施是最大程度地提高个人奖励,这可能是以牺牲社会福利为代价的。当观察到手臂性能时,我们为自私玩家(又称纳什平衡)提供了第一种算法(又称纳什平衡)。当还观察到碰撞时,严峻的触发策略类型可以实现一些基于隐性的算法,并且我们在两种不同的环境中构建了强大的算法:同质性(遗憾与集中式最佳最佳算法相比)和异类案例(对于适应性的和相关的遗憾记录)。当仅观察奖励或手臂在玩家中任意变化时,我们还会提供不可能的结果。
Motivated by cognitive radios, stochastic multi-player multi-armed bandits gained a lot of interest recently. In this class of problems, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward (obediently following some fixed protocol) has been mostly considered, robustness to malicious players is a crucial and challenging concern. Existing approaches consider only the case of adversarial jammers whose objective is to blindly minimize the collective reward. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare. We provide the first algorithm robust to selfish players (a.k.a. Nash equilibrium) with a logarithmic regret, when the arm performance is observed. When collisions are also observed, Grim Trigger type of strategies enable some implicit communication-based algorithms and we construct robust algorithms in two different settings: the homogeneous (with a regret comparable to the centralized optimal one) and heterogeneous cases (for an adapted and relevant notion of regret). We also provide impossibility results when only the reward is observed or when arm means vary arbitrarily among players.