论文标题

伯特强大的标签噪音吗?文本分类中使用嘈杂标签学习的学习研究

Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

论文作者

Zhu, Dawei, Hedderich, Michael A., Zhai, Fangzhou, Adelani, David Ifeoluwa, Klakow, Dietrich

论文摘要

当人注释者犯错或通过弱或遥远的监督生成数据时,训练数据中的标签不正确。已经表明,需要进行复杂的噪声处理技术 - 通过建模,清洁或过滤嘈杂的实例 - 防止模型拟合此标签噪声。但是,我们在这项工作中表明,对于具有现代NLP模型(如Bert)的文本分类任务,在各种噪声类型上,现有的噪声方法并不总是会改善其性能,甚至可能会恶化,这表明需要进一步研究。我们还通过全面的分析来支持我们的观察结果。

Incorrect labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision. It has been shown that complex noise-handling techniques - by modeling, cleaning or filtering the noisy instances - are required to prevent models from fitting this label noise. However, we show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noisehandling methods do not always improve its performance, and may even deteriorate it, suggesting the need for further investigation. We also back our observations with a comprehensive analysis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源