论文标题
及时竞争的土匪在不同的匹配市场
Competing Bandits in Time Varying Matching Markets
论文作者
论文摘要
我们研究在双面非平稳匹配市场中的在线学习问题,目标是融合稳定的匹配。特别是,我们考虑了市场的一侧(武器)在另一侧固定了一套已知的偏好集,即玩家。尽管在玩家固定但未知的偏好时已经研究了这个问题,但在这项工作中,我们研究了如何学习玩家何时时间变化和未知的问题。我们的贡献是一种可以处理任何类型的偏好结构和变化方案的方法。我们表明,使用拟议的算法,每个玩家都会获得{$ \ widetilde {\ Mathcal {o}}}}}(l^{1/2} _tt^{1/2}} $} $} $ l _ $ l_t $ $ l_t $的均值。因此,我们表明,尽管竞争达到恒定因素的差异,但可以达到单一学习的最佳率。我们还讨论了该算法的扩展,以使更改数的数量先验。
We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying and unknown. Our contribution is a methodology that can handle any type of preference structure and variation scenario. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {$\widetilde{\mathcal{O}}(L^{1/2}_TT^{1/2})$} up to the number of changes in the underlying preferences of the agents, $L_T$. Therefore, we show that the optimal rates for single-agent learning can be achieved in spite of the competition up to a difference of a constant factor. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.