论文标题
基于不一致的基于排名的嘈杂标签检测高质量数据
Inconsistency Ranking-based Noisy Label Detection for High-quality Data
论文作者
论文摘要
深度学习的成功需要高质量的注释和大量数据。但是,数据集的大小和质量通常在实践中是一种权衡,因为数据收集和清洁既昂贵又耗时。在现实世界中,尤其是使用众包数据集的应用程序中,排除嘈杂标签非常重要。为了解决这个问题,本文提出了一种自动噪声标签检测(NLD)技术,具有不一致的高质量数据排名。我们将此技术应用于自动扬声器验证(ASV)任务作为概念证明。我们研究了阶层间和阶层内不一致的排名,并在不同的噪声设置下比较了几个度量学习损失功能。实验结果证实,所提出的解决方案可以增加大规模扬声器识别数据集的有效清洁。
The success of deep learning requires high-quality annotated and massive data. However, the size and the quality of a dataset are usually a trade-off in practice, as data collection and cleaning are expensive and time-consuming. In real-world applications, especially those using crowdsourcing datasets, it is important to exclude noisy labels. To address this, this paper proposes an automatic noisy label detection (NLD) technique with inconsistency ranking for high-quality data. We apply this technique to the automatic speaker verification (ASV) task as a proof of concept. We investigate both inter-class and intra-class inconsistency ranking and compare several metric learning loss functions under different noise settings. Experimental results confirm that the proposed solution could increase both the efficient and effective cleaning of large-scale speaker recognition datasets.