论文标题
关于积极学习的边际利益:自私于蛋糕吗?
On the Marginal Benefit of Active Learning: Does Self-Supervision Eat Its Cake?
论文作者
论文摘要
主动学习是智能标记大型未标记数据集以减少标签工作的一组技术。同时,基于数据启发,对比度学习和自我训练的自我监管和半监督学习(S4L)的最新发展提供了强大的技术,从而可以极大地利用未标记的数据,从而大大降低了标准机器学习基准中所需标签的显着降低。一个自然的问题是,这些范式是否可以统一以获得卓越的结果。为此,本文提供了一种新颖的算法框架,该框架整合了自我监督的预处理,积极的学习和一致性调节的自我训练。我们对CIFAR10和CIFAR100数据集的框架进行了广泛的实验。这些实验使我们能够隔离并评估使用最新方法(例如〜Core-Set,Vaal,Simclr,FixMatch)评估各个组件的好处。我们的实验揭示了两个关键的见解:(i)自我监督的预训练显着改善了半监督的学习,尤其是在几个标签制度中,(ii)主动学习的益处被S4L技术破坏和补充。具体而言,当与最先进的S4L技术结合使用时,我们无法观察到最先进的主动学习算法的任何其他好处。
Active learning is the set of techniques for intelligently labeling large unlabeled datasets to reduce the labeling effort. In parallel, recent developments in self-supervised and semi-supervised learning (S4L) provide powerful techniques, based on data-augmentation, contrastive learning, and self-training, that enable superior utilization of unlabeled data which led to a significant reduction in required labeling in the standard machine learning benchmarks. A natural question is whether these paradigms can be unified to obtain superior results. To this aim, this paper provides a novel algorithmic framework integrating self-supervised pretraining, active learning, and consistency-regularized self-training. We conduct extensive experiments with our framework on CIFAR10 and CIFAR100 datasets. These experiments enable us to isolate and assess the benefits of individual components which are evaluated using state-of-the-art methods (e.g.~Core-Set, VAAL, simCLR, FixMatch). Our experiments reveal two key insights: (i) Self-supervised pre-training significantly improves semi-supervised learning, especially in the few-label regime, (ii) The benefit of active learning is undermined and subsumed by S4L techniques. Specifically, we fail to observe any additional benefit of state-of-the-art active learning algorithms when combined with state-of-the-art S4L techniques.