论文标题
嘈杂标签的错误校正
Error-Bounded Correction of Noisy Labels
论文作者
论文摘要
为了收集大规模注释的数据,不可避免地引入标签噪声,即不正确的类标签。为了抵抗标签噪声,许多成功的方法都依赖于嘈杂的分类器(即接受嘈杂训练数据训练的模型)来确定标签是否值得信赖。但是,尚不清楚为什么这种启发式在实践中效果很好。在本文中,我们为这些方法提供了第一个理论解释。我们证明,嘈杂分类器的预测确实可以很好地表明培训数据的标签是否干净。基于理论结果,我们提出了一种新型算法,该算法根据嘈杂的分类器预测来纠正标签。校正后的标签与具有高概率的真实贝叶斯最佳分类器一致。我们将标签校正算法纳入深度神经网络和火车模型的培训中,这些模型在多个公共数据集上实现了卓越的测试性能。
To collect large scale annotated data, it is inevitable to introduce label noise, i.e., incorrect class labels. To be robust against label noise, many successful methods rely on the noisy classifiers (i.e., models trained on the noisy training data) to determine whether a label is trustworthy. However, it remains unknown why this heuristic works well in practice. In this paper, we provide the first theoretical explanation for these methods. We prove that the prediction of a noisy classifier can indeed be a good indicator of whether the label of a training data is clean. Based on the theoretical result, we propose a novel algorithm that corrects the labels based on the noisy classifier prediction. The corrected labels are consistent with the true Bayesian optimal classifier with high probability. We incorporate our label correction algorithm into the training of deep neural networks and train models that achieve superior testing performance on multiple public datasets.