论文标题
上下文潘多拉的盒子
Contextual Pandora's Box
论文作者
论文摘要
潘多拉(Pandora)的盒子是一个基本的随机优化问题,决策者必须找到一个良好的选择,同时最大程度地减少探索每种选择价值的搜索成本。在原始配方中,假定为所有替代方案的值提供了准确的分布,而最近的工作研究潘多拉盒子的在线变体最初是未知的。在这项工作中,我们在在线环境中研究了潘多拉的盒子,同时融合了上下文。在每个回合中,我们都会有多种具有上下文,勘探成本和未知值的替代方案,该价值来自未知的分布,这些分布可能会在每个回合中发生变化。我们的主要结果是一种无重组算法,其性能与最佳算法相当出色,该算法准确地知道所有先前的分布。我们的算法即使在强盗设置中也起作用,在该设置中,该算法永远不会学习未探索的替代方案的值。使我们的结果能够实现的关键技术是对上下文匪徒中的可靠性条件进行了新的修改,将上下文与每个替代品分布(其“保留值”)的足够统计量连接起来,而不是其均值。
Pandora's Box is a fundamental stochastic optimization problem, where the decision-maker must find a good alternative while minimizing the search cost of exploring the value of each alternative. In the original formulation, it is assumed that accurate distributions are given for the values of all the alternatives, while recent work studies the online variant of Pandora's Box where the distributions are originally unknown. In this work, we study Pandora's Box in the online setting, while incorporating context. At every round, we are presented with a number of alternatives each having a context, an exploration cost and an unknown value drawn from an unknown distribution that may change at every round. Our main result is a no-regret algorithm that performs comparably well to the optimal algorithm which knows all prior distributions exactly. Our algorithm works even in the bandit setting where the algorithm never learns the values of the alternatives that were not explored. The key technique that enables our result is a novel modification of the realizability condition in contextual bandits that connects a context to a sufficient statistic of each alternative's distribution (its "reservation value") rather than its mean.