论文标题
贝叶斯积极学习生产,系统研究和可重复使用的库
Bayesian active learning for production, a systematic study and a reusable library
论文作者
论文摘要
主动学习能够通过使用机器学习模型来查询用户是否有特定输入来减少标签工作量。 尽管有许多有关新的活跃学习技术的论文,但这些技术很少满足现实世界项目的限制。在本文中,我们分析了当前活跃学习技术的主要缺点,并提出了减轻它们的方法。我们对现实数据集对深度积极学习过程的最常见问题的影响进行系统研究:模型收敛,注释错误和数据集不平衡。我们得出两种可以加快主动学习循环的技术,例如部分不确定性采样和更大的查询尺寸。最后,我们介绍了我们的开源贝叶斯活跃学习库Baal。
Active learning is able to reduce the amount of labelling effort by using a machine learning model to query the user for specific inputs. While there are many papers on new active learning techniques, these techniques rarely satisfy the constraints of a real-world project. In this paper, we analyse the main drawbacks of current active learning techniques and we present approaches to alleviate them. We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process: model convergence, annotation error, and dataset imbalance. We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size. Finally, we present our open-source Bayesian active learning library, BaaL.