论文标题
从检索到的句子中基于模板的问题生成,以改进无监督的问题回答
Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering
论文作者
论文摘要
问题回答(QA)的需求不断增加,因为在线可用的信息量以及对快速访问此内容的渴望的增长。质量保证的一种常见方法是在特定于任务标记的数据集中微调一个预算的语言模型。然而,这种范式依赖于稀缺和昂贵的,获得大规模的人类标记的数据。我们提出了一种无监督的方法,用于使用生成的伪训练数据培训质量检查模型。我们表明,通过在相关的,检索的句子上应用一个简单的模板而不是原始上下文句子来生成质量检查训练的问题,从而通过允许模型学习更复杂的上下文提问关系来改善下游QA的性能。训练QA模型对此数据的训练对小队数据集的F1分数的先前无监督模型的相对改进约为14%,而当答案是一个指定的实体时,可以实现无监督的QA小队的最先进表现。
Question Answering (QA) is in increasing demand as the amount of information available online and the desire for quick access to this content grows. A common approach to QA has been to fine-tune a pretrained language model on a task-specific labeled dataset. This paradigm, however, relies on scarce, and costly to obtain, large-scale human-labeled data. We propose an unsupervised approach to training QA models with generated pseudo-training data. We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance by allowing the model to learn more complex context-question relationships. Training a QA model on this data gives a relative improvement over a previous unsupervised model in F1 score on the SQuAD dataset by about 14%, and 20% when the answer is a named entity, achieving state-of-the-art performance on SQuAD for unsupervised QA.