基于折扣正常申诉人的平滑在线凸优化

论文标题

基于折扣正常申诉人的平滑在线凸优化

Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor

论文作者

Zhang, Lijun, Jiang, Wei, Yi, Jinfeng, Yang, Tianbao

论文摘要

在本文中，我们调查了一种在线预测策略，称为折扣正常申诉人（Kapralov and Panigrahy，2010年），以进行平滑的在线凸优化（SOCO），其中学习者不仅需要最大程度地降低打击成本，还要最大程度地减少开关成本。在通过专家建议进行学习的情况下，Daniely和Mansour（2019）表明，即使在开关成本的存在下，也可以利用折扣正常范围来在任何时间间隔内产生几乎最佳的后悔界限。受其结果的启发，我们为SOCO开发了一种简单的算法：将在线渐变下降（OGD）与折扣正常范围依次相结合的步骤尺寸相结合。尽管它很简单，但我们证明它能够最大程度地减少转换成本的自适应后悔，即，在每个间隔上切换成本几乎令人遗憾。通过利用OGD的理论保证对动态遗憾，我们进一步表明，所提出的算法可以在每个间隔内切换成本来最大程度地减少动态遗憾。

In this paper, we investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO), in which the learner needs to minimize not only the hitting cost but also the switching cost. In the setting of learning with expert advice, Daniely and Mansour (2019) demonstrate that Discounted-Normal-Predictor can be utilized to yield nearly optimal regret bounds over any interval, even in the presence of switching costs. Inspired by their results, we develop a simple algorithm for SOCO: Combining online gradient descent (OGD) with different step sizes sequentially by Discounted-Normal-Predictor. Despite its simplicity, we prove that it is able to minimize the adaptive regret with switching cost, i.e., attaining nearly optimal regret with switching cost on every interval. By exploiting the theoretical guarantee of OGD for dynamic regret, we further show that the proposed algorithm can minimize the dynamic regret with switching cost in every interval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题