伯特强大的标签噪音吗？文本分类中使用嘈杂标签学习的学习研究

论文标题

伯特强大的标签噪音吗？文本分类中使用嘈杂标签学习的学习研究

Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

论文作者

Zhu, Dawei, Hedderich, Michael A., Zhai, Fangzhou, Adelani, David Ifeoluwa, Klakow, Dietrich

论文摘要

当人注释者犯错或通过弱或遥远的监督生成数据时，训练数据中的标签不正确。已经表明，需要进行复杂的噪声处理技术 - 通过建模，清洁或过滤嘈杂的实例 - 防止模型拟合此标签噪声。但是，我们在这项工作中表明，对于具有现代NLP模型（如Bert）的文本分类任务，在各种噪声类型上，现有的噪声方法并不总是会改善其性能，甚至可能会恶化，这表明需要进一步研究。我们还通过全面的分析来支持我们的观察结果。

Incorrect labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision. It has been shown that complex noise-handling techniques - by modeling, cleaning or filtering the noisy instances - are required to prevent models from fitting this label noise. However, we show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noisehandling methods do not always improve its performance, and may even deteriorate it, suggesting the need for further investigation. We also back our observations with a comprehensive analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题