Unirank：用于在线排名的单峰土匪算法

论文标题

Unirank：用于在线排名的单峰土匪算法

UniRank: Unimodal Bandit Algorithm for Online Ranking

论文作者

Gauthier, Camille-Sovanneary, Gaudel, Romaric, Fromont, Elisa

论文摘要

我们解决了一个新的新兴问题，该问题正在加权图中找到最佳的单核匹配。 \ cite {adma}解决了每次迭代时进行完整匹配的半频次版本，创建了一种算法，预期遗憾的是与$ o（\ frac {l \ log log（l）}δ\ log（t）Δ\ log（t））$，$ 2L $ thupers，$ t $ t players，$ t $ t $ iteerations $ t $ iteerations $ temimine $ $ $ $ $ $ $ $ neudeument和$ $ $ ne关系我们分两个步骤减少了这一界限。首先，如\ cite {grab}和\ cite {unirank}，我们在适当的图上使用预期奖励的非模式性属性来设计算法，并遗憾地在$ o（l \ frac {1}Δ\ log log（t））中。其次，我们表明，通过将焦点转移到主要问题`\ emph {用户$ i $比用户$ j $？}'这遗憾的变成$ o（l \fracΔ{\tildeΔ^2} \ log（t））$，其中$ \tildeΔ>δ$源自更好地比较用户的方式。一些实验结果最终表明这些理论结果在实践中得到了证实。

We tackle a new emerging problem, which is finding an optimal monopartite matching in a weighted graph. The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by \cite{ADMA}, creating an algorithm with an expected regret matching $O(\frac{L\log(L)}Δ\log(T))$ with $2L$ players, $T$ iterations and a minimum reward gap $Δ$. We reduce this bound in two steps. First, as in \cite{GRAB} and \cite{UniRank} we use the unimodality property of the expected reward on the appropriate graph to design an algorithm with a regret in $O(L\frac{1}Δ\log(T))$. Secondly, we show that by moving the focus towards the main question `\emph{Is user $i$ better than user $j$?}' this regret becomes $O(L\fracΔ{\tildeΔ^2}\log(T))$, where $\TildeΔ > Δ$ derives from a better way of comparing users. Some experimental results finally show these theoretical results are corroborated in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题