论文标题
Unirank:用于在线排名的单峰土匪算法
UniRank: Unimodal Bandit Algorithm for Online Ranking
论文作者
论文摘要
我们解决了一个新的新兴问题,该问题正在加权图中找到最佳的单核匹配。 \ cite {adma}解决了每次迭代时进行完整匹配的半频次版本,创建了一种算法,预期遗憾的是与$ o(\ frac {l \ log log(l)}δ\ log(t)Δ\ log(t))$,$ 2L $ thupers,$ t $ t players,$ t $ t $ iteerations $ t $ iteerations $ temimine $ $ $ $ $ $ $ $ neudeument和$ $ $ ne关系 我们分两个步骤减少了这一界限。首先,如\ cite {grab}和\ cite {unirank},我们在适当的图上使用预期奖励的非模式性属性来设计算法,并遗憾地在$ o(l \ frac {1}Δ\ log log(t))中。 其次,我们表明,通过将焦点转移到主要问题`\ emph {用户$ i $比用户$ j $?}'这遗憾的变成$ o(l \fracΔ{\tildeΔ^2} \ log(t))$,其中$ \tildeΔ>δ$源自更好地比较用户的方式。 一些实验结果最终表明这些理论结果在实践中得到了证实。
We tackle a new emerging problem, which is finding an optimal monopartite matching in a weighted graph. The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by \cite{ADMA}, creating an algorithm with an expected regret matching $O(\frac{L\log(L)}Δ\log(T))$ with $2L$ players, $T$ iterations and a minimum reward gap $Δ$. We reduce this bound in two steps. First, as in \cite{GRAB} and \cite{UniRank} we use the unimodality property of the expected reward on the appropriate graph to design an algorithm with a regret in $O(L\frac{1}Δ\log(T))$. Secondly, we show that by moving the focus towards the main question `\emph{Is user $i$ better than user $j$?}' this regret becomes $O(L\fracΔ{\tildeΔ^2}\log(T))$, where $\TildeΔ > Δ$ derives from a better way of comparing users. Some experimental results finally show these theoretical results are corroborated in practice.