关于积极学习的边际利益：自私于蛋糕吗？

论文标题

关于积极学习的边际利益：自私于蛋糕吗？

On the Marginal Benefit of Active Learning: Does Self-Supervision Eat Its Cake?

论文作者

Chan, Yao-Chun, Li, Mingchen, Oymak, Samet

论文摘要

主动学习是智能标记大型未标记数据集以减少标签工作的一组技术。同时，基于数据启发，对比度学习和自我训练的自我监管和半监督学习（S4L）的最新发展提供了强大的技术，从而可以极大地利用未标记的数据，从而大大降低了标准机器学习基准中所需标签的显着降低。一个自然的问题是，这些范式是否可以统一以获得卓越的结果。为此，本文提供了一种新颖的算法框架，该框架整合了自我监督的预处理，积极的学习和一致性调节的自我训练。我们对CIFAR10和CIFAR100数据集的框架进行了广泛的实验。这些实验使我们能够隔离并评估使用最新方法（例如〜Core-Set，Vaal，Simclr，FixMatch）评估各个组件的好处。我们的实验揭示了两个关键的见解：（i）自我监督的预训练显着改善了半监督的学习，尤其是在几个标签制度中，（ii）主动学习的益处被S4L技术破坏和补充。具体而言，当与最先进的S4L技术结合使用时，我们无法观察到最先进的主动学习算法的任何其他好处。

Active learning is the set of techniques for intelligently labeling large unlabeled datasets to reduce the labeling effort. In parallel, recent developments in self-supervised and semi-supervised learning (S4L) provide powerful techniques, based on data-augmentation, contrastive learning, and self-training, that enable superior utilization of unlabeled data which led to a significant reduction in required labeling in the standard machine learning benchmarks. A natural question is whether these paradigms can be unified to obtain superior results. To this aim, this paper provides a novel algorithmic framework integrating self-supervised pretraining, active learning, and consistency-regularized self-training. We conduct extensive experiments with our framework on CIFAR10 and CIFAR100 datasets. These experiments enable us to isolate and assess the benefits of individual components which are evaluated using state-of-the-art methods (e.g.~Core-Set, VAAL, simCLR, FixMatch). Our experiments reveal two key insights: (i) Self-supervised pre-training significantly improves semi-supervised learning, especially in the few-label regime, (ii) The benefit of active learning is undermined and subsumed by S4L techniques. Specifically, we fail to observe any additional benefit of state-of-the-art active learning algorithms when combined with state-of-the-art S4L techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题