论文标题
Genaug:鉴定文本生成器的数据增强
GenAug: Data Augmentation for Finetuning Text Generators
论文作者
论文摘要
在本文中,我们调查了文本生成的数据增强,我们称之为Genaug。文本生成和语言建模是自然语言处理中的重要任务,并且对于低数据制度尤其具有挑战性。我们提出和评估各种增强方法,包括一些包含外部知识的方法,用于在Yelp评论的一部分中进行FINETUNTUNENETUNTUNTUNTUNTUN。我们还研究了增强量与生成文本的质量之间的关系。我们利用几个评估生成文本的重要方面的指标,包括其多样性和流利性。我们的实验表明,插入角色级合成噪声和用高nyms替换的关键字替代是有效的增强方法,并且世代的质量以大约是原始数据量的大约三倍的峰值提高到峰值。
In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data.