组合多臂匪徒的紧密下限

论文标题

组合多臂匪徒的紧密下限

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

论文作者

Merlis, Nadav, Mannor, Shie

论文摘要

组合多武器的匪徒问题是一个顺序的决策问题，在该问题中，代理在每个回合上选择一组武器，观察到每个手臂的反馈，并旨在最大程度地提高其选择的手臂的已知奖励功能。尽管以前的工作证明了这种环境中的一般奖励功能的上限，但只提供了匹配下限的少数作品，所有这些作品都用于特定的奖励功能。在这项工作中，我们证明了对所有平滑奖励功能的轻度假设的组合匪徒的遗憾。我们得出问题依赖性和与问题无关的界限，并表明最近提出的Gini加权平滑度参数（Merlis and Mannor，2019年）也确定了单调奖励函数的下限。值得注意的是，这意味着我们的下界紧密到对数因子。

The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose. While previous work proved regret upper bounds in this setting for general reward functions, only a few works provided matching lower bounds, all for specific reward functions. In this work, we prove regret lower bounds for combinatorial bandits that hold under mild assumptions for all smooth reward functions. We derive both problem-dependent and problem-independent bounds and show that the recently proposed Gini-weighted smoothness parameter (Merlis and Mannor, 2019) also determines the lower bounds for monotone reward functions. Notably, this implies that our lower bounds are tight up to log-factors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题