论文标题
平滑的在线学习与统计学习一样容易
Smoothed Online Learning is as Easy as Statistical Learning
论文作者
论文摘要
现代学习理论的大部分是在两个制度之间分配的:数据独立到达的经典离线设置以及在线设置,数据在对手方面到达。虽然前者通常在计算和统计上是可触犯的,但后者不需要分布假设。为了实现两全其美的尝试,以前的工作提出了平稳的在线设置,在该设置中,每个样本都是从对抗分布的平滑分布中绘制的,即平滑,即,相对于固定的统治度量,它具有有界密度。我们为学习非参数功能类别的最小值遗憾提供了紧密的界限,对地平线和平滑度参数几乎最佳。此外,我们在此设置中提供了第一个Oracle效率,无需重新格局的算法。特别是,我们提出了一种遗憾的算法不当,其遗憾实现了对地平线的最佳依赖性和适当的算法,每轮只需要一个单一的甲骨文呼叫,其遗憾的遗憾具有分类环境中的最佳地平线依赖性,并且通常是sublrinear。这两种算法的依赖性依赖于对手的平滑度参数的依赖性比最小值速率更差。然后,我们证明了任何适当学习算法的甲骨文复杂性的下限,该算法与甲骨文有效的上限匹配到多项式因素,从而证明了在平稳的在线学习中存在统计计算差距。最后,我们将结果应用于上下文的匪徒设置,以表明如果在经典设置中可以学习函数类,那么在上下文以平稳方式到达的情况下,有一种oracle效率,无需重新格局的算法。
Much of modern learning theory has been split between two regimes: the classical offline setting, where data arrive independently, and the online setting, where data arrive adversarially. While the former model is often both computationally and statistically tractable, the latter requires no distributional assumptions. In an attempt to achieve the best of both worlds, previous work proposed the smooth online setting where each sample is drawn from an adversarially chosen distribution, which is smooth, i.e., it has a bounded density with respect to a fixed dominating measure. We provide tight bounds on the minimax regret of learning a nonparametric function class, with nearly optimal dependence on both the horizon and smoothness parameters. Furthermore, we provide the first oracle-efficient, no-regret algorithms in this setting. In particular, we propose an oracle-efficient improper algorithm whose regret achieves optimal dependence on the horizon and a proper algorithm requiring only a single oracle call per round whose regret has the optimal horizon dependence in the classification setting and is sublinear in general. Both algorithms have exponentially worse dependence on the smoothness parameter of the adversary than the minimax rate. We then prove a lower bound on the oracle complexity of any proper learning algorithm, which matches the oracle-efficient upper bounds up to a polynomial factor, thus demonstrating the existence of a statistical-computational gap in smooth online learning. Finally, we apply our results to the contextual bandit setting to show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits in the case that contexts arrive in a smooth manner.