许多世界中最好的保证可以通过背包保证在线学习

论文标题

许多世界中最好的保证可以通过背包保证在线学习

Best of Many Worlds Guarantees for Online Learning with Knapsacks

论文作者

Celli, Andrea, Castiglioni, Matteo, Kroer, Christian

论文摘要

我们研究在线学习问题，决策者希望在不违反有限的$ M $资源限制的情况下最大化其预期奖励。通过在策略混合物的适当定义的空间上施放学习过程，我们在拉格朗日放松基础优化问题上恢复了强大的二元性，即使对于具有非征收奖励和资源消耗功能的一般环境也是如此。然后，我们为此设置提供了第一个最佳世界类型框架，并在随机，对抗和非平稳输入下提供无需保证。在随机案件中，我们的框架给予同样的遗憾保证。另一方面，当预算至少在时间范围内线性地增长时，它使我们能够在对抗性情况下提供恒定的竞争比率，该竞争性比例比$ o（\ log log M \ log t）$的最著名上限界改善。此外，我们的框架使决策者可以处理非凸奖励和成本功能。我们提供了框架的两个游戏理论应用，以进一步证明其灵活性。在这样做的过程中，我们表明可以在重复的第一价格拍卖中实施预算拟定机制。

We study online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$ resource constraints. By casting the learning process over a suitably defined space of strategy mixtures, we recover strong duality on a Lagrangian relaxation of the underlying optimization problem, even for general settings with non-convex reward and resource-consumption functions. Then, we provide the first best-of-many-worlds type framework for this setting, with no-regret guarantees under stochastic, adversarial, and non-stationary inputs. Our framework yields the same regret guarantees of prior work in the stochastic case. On the other hand, when budgets grow at least linearly in the time horizon, it allows us to provide a constant competitive ratio in the adversarial case, which improves over the best known upper bound bound of $O(\log m \log T)$. Moreover, our framework allows the decision maker to handle non-convex reward and cost functions. We provide two game-theoretic applications of our framework to give further evidence of its flexibility. In doing so, we show that it can be employed to implement budget-pacing mechanisms in repeated first-price auctions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题