无需重新兼容的在线学习

论文标题

无需重新兼容的在线学习

No-Regret and Incentive-Compatible Online Learning

论文作者

Freeman, Rupert, Pennock, David M., Podimata, Chara, Vaughan, Jennifer Wortman

论文摘要

我们研究在线学习环境，专家在战略上采取行动以最大程度地利用他们对一系列二进制事件的信念，从而最大程度地提高他们对学习算法的预测的影响。我们的目标是双重的。首先，我们希望学习算法与事后最佳固定专家有关。其次，我们希望激励性兼容性，保证每个专家的最佳策略是报告他对每个事件实现的真实信念。为了实现这一目标，我们基于关于下注机制的文献，这是一种多代理评分规则。我们提供的算法对于全面和部分信息设置都无法为近视专家提供遗憾和激励的兼容性。在FiveThirtyeight的数据集的实验中，我们的算法对与经典的No-Regret算法相提并论后悔，这些算法并非激励兼容。最后，我们为前瞻性战略代理商确定了一种激励兼容的算法，该算法在实践中表现出减少的遗憾。

We study online learning settings in which experts act strategically to maximize their influence on the learning algorithm's predictions by potentially misreporting their beliefs about a sequence of binary events. Our goal is twofold. First, we want the learning algorithm to be no-regret with respect to the best fixed expert in hindsight. Second, we want incentive compatibility, a guarantee that each expert's best strategy is to report his true beliefs about the realization of each event. To achieve this goal, we build on the literature on wagering mechanisms, a type of multi-agent scoring rule. We provide algorithms that achieve no regret and incentive compatibility for myopic experts for both the full and partial information settings. In experiments on datasets from FiveThirtyEight, our algorithms have regret comparable to classic no-regret algorithms, which are not incentive-compatible. Finally, we identify an incentive-compatible algorithm for forward-looking strategic agents that exhibits diminishing regret in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题