学习代理人的同行预测

论文标题

学习代理人的同行预测

Peer Prediction for Learning Agents

论文作者

Feng, Shi, Yu, Fang-Yi, Chen, Yiling

论文摘要

当直接验证获得的信息是不可用的情况下，同行预测是指从人类代理中获取信息的一系列机制。它们旨在具有游戏理论均衡，每个人都可以真实地揭示他们的私人信息。该结果是在假设代理人是贝叶斯的假设下，他们各自在所有任务中采用固定策略。然而，在许多领域都观察到人类代理在顺序设置中表现出学习行为。在本文中，我们探讨了当参与者学习推动者时，顺序同行预测机制的动力学。我们首先表明，对于代理商的学习算法而言，不遗憾的概念不能保证与真实策略的融合。然后，我们专注于一种学习算法的家庭，其中策略的更新仅取决于代理人的累积奖励，并证明了普遍的相关协议（CA）机制中代理人在使用该家族算法时的策略（CA）机制。该算法家族不一定是无重组的，而包括几种熟悉的无重格学习算法（例如，乘法重量更新并遵循扰动的领导者）作为特殊情况。模拟该家族中的几种算法以及该家族之外的$ε$ - 梅迪算法，显示了CA机制中与真实策略的融合。

Peer prediction refers to a collection of mechanisms for eliciting information from human agents when direct verification of the obtained information is unavailable. They are designed to have a game-theoretic equilibrium where everyone reveals their private information truthfully. This result holds under the assumption that agents are Bayesian and they each adopt a fixed strategy across all tasks. Human agents however are observed in many domains to exhibit learning behavior in sequential settings. In this paper, we explore the dynamics of sequential peer prediction mechanisms when participants are learning agents. We first show that the notion of no regret alone for the agents' learning algorithms cannot guarantee convergence to the truthful strategy. We then focus on a family of learning algorithms where strategy updates only depend on agents' cumulative rewards and prove that agents' strategies in the popular Correlated Agreement (CA) mechanism converge to truthful reporting when they use algorithms from this family. This family of algorithms is not necessarily no-regret, but includes several familiar no-regret learning algorithms (e.g multiplicative weight update and Follow the Perturbed Leader) as special cases. Simulation of several algorithms in this family as well as the $ε$-greedy algorithm, which is outside of this family, shows convergence to the truthful strategy in the CA mechanism.

下载PDF全文

下载文献需遵守相关版权规定

论文标题