从数据流进行在线强大和自适应学习

论文标题

从数据流进行在线强大和自适应学习

Online Robust and Adaptive Learning from Data Streams

论文作者

Fukushima, Shintaro, Nitanda, Atsushi, Yamanishi, Kenji

论文摘要

在从非平稳数据流的在线学习中，有必要对异常值进行强大的学习，并快速适应基础数据生成机制的变化。在本文中，我们将以前的在线学习算法的属性称为鲁棒性，而后者则是适应性。这两个属性之间存在明显的权衡。量化和评估权衡是一个基本问题，因为它提供了有关数据生成机制的重要信息。但是，以前的工作量数量上没有考虑折衷。我们提出了一种新型算法，称为基于随机近似的鲁棒性自适应算法（SRA）来评估权衡。 SRA的关键思想是使用有偏见的随机近似方案更新分布或足够统计的参数，同时删除具有随机更新值较大值的数据点。我们解决两个参数之间的关系：一个是随机近似的步长，另一个是随机更新规范的阈值参数。前者控制适应性，后者具有鲁棒性。我们在存在异常值的情况下对SRA的非反应收敛进行了理论分析，这取决于步长大小和阈值参数。由于SRA是根据多数化最小化原理制定的，因此它是一种一般算法，其中包括许多算法，例如在线EM算法和随机梯度下降。合成和实际数据集的经验实验表明，SRA优于以前的方法。

In online learning from non-stationary data streams, it is necessary to learn robustly to outliers and to adapt quickly to changes in the underlying data generating mechanism. In this paper, we refer to the former attribute of online learning algorithms as robustness and to the latter as adaptivity. There is an obvious tradeoff between the two attributes. It is a fundamental issue to quantify and evaluate the tradeoff because it provides important information on the data generating mechanism. However, no previous work has considered the tradeoff quantitatively. We propose a novel algorithm called the stochastic approximation-based robustness-adaptivity algorithm (SRA) to evaluate the tradeoff. The key idea of SRA is to update parameters of distribution or sufficient statistics with the biased stochastic approximation scheme, while dropping data points with large values of the stochastic update. We address the relation between the two parameters: one is the step size of the stochastic approximation, and the other is the threshold parameter of the norm of the stochastic update. The former controls the adaptivity and the latter does the robustness. We give a theoretical analysis for the non-asymptotic convergence of SRA in the presence of outliers, which depends on both the step size and threshold parameter. Because SRA is formulated on the majorization-minimization principle, it is a general algorithm that includes many algorithms, such as the online EM algorithm and stochastic gradient descent. Empirical experiments for both synthetic and real datasets demonstrated that SRA was superior to previous methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题