使用神经网络进行健壮和可重现的积极学习

论文标题

使用神经网络进行健壮和可重现的积极学习

Towards Robust and Reproducible Active Learning Using Neural Networks

论文作者

Munjal, Prateek, Hayat, Nasir, Hayat, Munawar, Sourati, Jamshid, Khan, Shadab

论文摘要

主动学习（AL）是一个有希望的ML范式，有可能解析大型未标记数据，并有助于降低标记数据可能令人难以置信的域中的注释成本。最近提出的基于神经网络的AL方法使用不同的启发式方法来实现这一目标。在这项研究中，我们证明，在相同的实验环境下，不同类型的AL算法（基于不确定性，基于多样性和基于委员会的算法）对随机采样基线产生不一致的增益。通过各种实验，控制了随机性的来源，我们表明，AL算法实现的性能指标的差异可以导致与先前报道的结果不符的结果。我们还发现，在强烈的正则化下，Al方法在各种实验条件下显示出比随机采样基线的边缘或没有优势。最后，我们以一系列建议结束了有关如何使用新的AL算法评估结果的建议，以确保在实验条件下的变化下结果可再现和健壮。我们共享我们的代码以促进评估。我们认为，我们的发现和建议将有助于使用神经网络在AL中进行可重复的研究。我们通过https://github.com/prateekmunjal/torchal开源代码

Active learning (AL) is a promising ML paradigm that has the potential to parse through large unlabeled data and help reduce annotation cost in domains where labeling data can be prohibitive. Recently proposed neural network based AL methods use different heuristics to accomplish this goal. In this study, we demonstrate that under identical experimental settings, different types of AL algorithms (uncertainty based, diversity based, and committee based) produce an inconsistent gain over random sampling baseline. Through a variety of experiments, controlling for sources of stochasticity, we show that variance in performance metrics achieved by AL algorithms can lead to results that are not consistent with the previously reported results. We also found that under strong regularization, AL methods show marginal or no advantage over the random sampling baseline under a variety of experimental conditions. Finally, we conclude with a set of recommendations on how to assess the results using a new AL algorithm to ensure results are reproducible and robust under changes in experimental conditions. We share our codes to facilitate AL evaluations. We believe our findings and recommendations will help advance reproducible research in AL using neural networks. We open source our code at https://github.com/PrateekMunjal/TorchAL

下载PDF全文

下载文献需遵守相关版权规定

论文标题