混音转换器：NLP任务的动态数据增强

论文标题

混音转换器：NLP任务的动态数据增强

Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks

论文作者

Sun, Lichao, Xia, Congying, Yin, Wenpeng, Liang, Tingting, Yu, Philip S., He, Lifang

论文摘要

混音是最新的数据增强技术，该技术线性插值输入示例和相应的标签。通过在像素级别插值图像，它在图像分类中表现出了强大的有效性。在本文中，我们探索了这一研究的启发，我们探讨了如何将混合措施应用于自然语言处理任务，因为很难以原始格式混合文本数据； ii）如果混合在基于变压器的学习模型中仍然有效，例如Bert。为了实现该目标，我们将混合带入基于变压器的预训练的架构，名为“ Mixup-Transformer”，以进行多种NLP任务，同时保持整个端到端训练系统。我们通过在胶水基准上运行大量实验来评估所提出的框架。此外，我们还通过以一定比例降低训练数据来研究混合转化器在低资源场景中的性能。我们的研究表明，混合是针对预训练的语言模型的一种独立于域的数据增强技术，从而对基于变压器的模型进行了显着改进的性能。

Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. It has shown strong effectiveness in image classification by interpolating images at the pixel level. Inspired by this line of research, in this paper, we explore i) how to apply mixup to natural language processing tasks since text data can hardly be mixed in the raw format; ii) if mixup is still effective in transformer-based learning models, e.g., BERT. To achieve the goal, we incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks while keeping the whole end-to-end training system. We evaluate the proposed framework by running extensive experiments on the GLUE benchmark. Furthermore, we also examine the performance of mixup-transformer in low-resource scenarios by reducing the training data with a certain ratio. Our studies show that mixup is a domain-independent data augmentation technique to pre-trained language models, resulting in significant performance improvement for transformer-based models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题