论文标题
半监督的漂流流学习,短暂回顾
Semi-supervised Drifted Stream Learning with Short Lookback
论文作者
论文摘要
在许多情况下,1)数据流是实时生成的; 2)标记的数据很昂贵,一开始只有有限的标签; 3)现实世界的数据并不总是I.I.D。随着时间的流逝,数据逐渐漂移; 4)历史流的存储是有限的,只有基于非常短的回顾窗口才能实现模型更新。这种学习设置限制了许多机器学习(ML)算法的适用性和可用性。我们将学习任务概括为在半监督的漂流流学习中,带有简短的回顾问题(SDSL)。 SDSL对半监督学习,持续学习和领域适应的现有方法施加了两项未解决的挑战:1)在逐渐变化下稳健的伪标记和2)较短的回顾症。为了应对这些挑战,我们提出了一个有原则的通用一代复制框架来解决SDSL。该框架能够完成:1)在一代步骤中稳健的伪标记; 2)在重播步骤中进行反遗嘱适应。为了获得强大的伪标记,我们开发了一种新颖的伪标签分类模型,以利用对先前标记的数据的监督知识,无监督的新数据知识以及不变标签语义的结构知识。为了实现自适应抗遗传模型重播,我们建议将抗遗嘱适应任务视为平坦的区域搜索问题。我们提出了一种基于最小游戏的新型重放目标功能,以解决平面区域搜索问题并开发有效的优化求解器。最后,我们提出了广泛的实验来证明我们的框架可以有效地解决在漂流流中短暂回顾的抗孔学习的任务。
In many scenarios, 1) data streams are generated in real time; 2) labeled data are expensive and only limited labels are available in the beginning; 3) real-world data is not always i.i.d. and data drift over time gradually; 4) the storage of historical streams is limited and model updating can only be achieved based on a very short lookback window. This learning setting limits the applicability and availability of many Machine Learning (ML) algorithms. We generalize the learning task under such setting as a semi-supervised drifted stream learning with short lookback problem (SDSL). SDSL imposes two under-addressed challenges on existing methods in semi-supervised learning, continuous learning, and domain adaptation: 1) robust pseudo-labeling under gradual shifts and 2) anti-forgetting adaptation with short lookback. To tackle these challenges, we propose a principled and generic generation-replay framework to solve SDSL. The framework is able to accomplish: 1) robust pseudo-labeling in the generation step; 2) anti-forgetting adaption in the replay step. To achieve robust pseudo-labeling, we develop a novel pseudo-label classification model to leverage supervised knowledge of previously labeled data, unsupervised knowledge of new data, and, structure knowledge of invariant label semantics. To achieve adaptive anti-forgetting model replay, we propose to view the anti-forgetting adaptation task as a flat region search problem. We propose a novel minimax game-based replay objective function to solve the flat region search problem and develop an effective optimization solver. Finally, we present extensive experiments to demonstrate our framework can effectively address the task of anti-forgetting learning in drifted streams with short lookback.