遗憾分析分层专家强盗问题

论文标题

遗憾分析分层专家强盗问题

Regret Analysis for Hierarchical Experts Bandit Problem

论文作者

Guo, Qihan, Wang, Siwei, Zhu, Jun

论文摘要

我们研究了标准匪徒问题的扩展，其中有很多专家。多层专家按一层进行选择，只有最后一层的专家才能发挥作用。学习政策的目的是最大程度地减少该等级专家设置的彻底遗憾。我们首先分析了总遗憾随层数线性增长的案例。然后，我们关注的是所有专家都在使用上限信心（UCB）策略，并在不同情况下给出了几个次线性上限。最后，我们设计了一些实验，以帮助对分层UCB结构的一般情况进行遗憾分析，并显示我们理论结果的实际意义。本文提供了许多有关合理层次决策结构的见解。

We study an extension of standard bandit problem in which there are R layers of experts. Multi-layered experts make selections layer by layer and only the experts in the last layer can play arms. The goal of the learning policy is to minimize the total regret in this hierarchical experts setting. We first analyze the case that total regret grows linearly with the number of layers. Then we focus on the case that all experts are playing Upper Confidence Bound (UCB) strategy and give several sub-linear upper bounds for different circumstances. Finally, we design some experiments to help the regret analysis for the general case of hierarchical UCB structure and show the practical significance of our theoretical results. This article gives many insights about reasonable hierarchical decision structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题