长期限制的在线优化的统一框架

论文标题

长期限制的在线优化的统一框架

A Unifying Framework for Online Optimization with Long-Term Constraints

论文作者

Castiglioni, Matteo, Celli, Andrea, Marchesi, Alberto, Romano, Giulia, Gatti, Nicola

论文摘要

我们研究在线学习问题，决策者必须采取一系列决策，但要受到$ M $长期限制。决策者的目标是最大化其全部奖励，同时达到$ t $ rounds的违规行为。我们提出了这类问题类别类别的第一个最佳世界类型算法，在根据未知随机模型选择奖励和约束的情况下，无需保证，并且在每个回合中选择对手的情况下。对于满足长期约束的最佳固定策略，我们的算法是第一个在对抗环境中提供保证的算法。特别是，它保证了$ρ/（1+ρ）$的最佳奖励和sublrinear后悔的分数，其中$ρ$是与严格可行解决方案存在有关的可行性参数。我们的框架将传统的遗憾最小化作为黑盒组件。因此，通过使用适当的遗憾最小化器进行实例化，它可以处理全反馈以及强盗反馈设置。此外，它允许决策者通过非凸奖励和约束无缝处理场景。我们展示了如何在重复拍卖的预算管理机制的背景下应用我们的框架，以保证不包装的长期约束（例如，ROI约束）。

We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown stochastic model, and in the case in which they are selected at each round by an adversary. Our algorithm is the first to provide guarantees in the adversarial setting with respect to the optimal fixed strategy that satisfies the long-term constraints. In particular, it guarantees a $ρ/(1+ρ)$ fraction of the optimal reward and sublinear regret, where $ρ$ is a feasibility parameter related to the existence of strictly feasible solutions. Our framework employs traditional regret minimizers as black-box components. Therefore, by instantiating it with an appropriate choice of regret minimizers it can handle the full-feedback as well as the bandit-feedback setting. Moreover, it allows the decision maker to seamlessly handle scenarios with non-convex rewards and constraints. We show how our framework can be applied in the context of budget-management mechanisms for repeated auctions in order to guarantee long-term constraints that are not packing (e.g., ROI constraints).

下载PDF全文

下载文献需遵守相关版权规定

论文标题