论文标题
无监督的质量检查
Harvesting and Refining Question-Answer Pairs for Unsupervised QA
论文作者
论文摘要
由于大规模数据集的可用性和神经模型的有效性,问题回答(QA)取得了巨大的成功。最近的研究工作试图将这些成功扩展到几乎没有标记的数据的设置。在这项工作中,我们介绍了两种改善无监督质量检查的方法。首先,我们从Wikipedia通过词汇和句法发散的问题收获,以自动构建一个问答对的语料库(称为RefQA)。其次,我们利用QA模型提取更合适的答案,迭代地通过REFQA来完善数据。我们通过微调伯特(Bert)进行微调,无需访问手动注释的数据,对Squad 1.1和NewsQA进行了实验。我们的方法的表现优于以前的无监督方法,并且与早期监督模型具有竞争力。我们还在几次学习环境中展示了方法的有效性。
Question Answering (QA) has shown great success thanks to the availability of large-scale datasets and the effectiveness of neural models. Recent research works have attempted to extend these successes to the settings with few or no labeled data available. In this work, we introduce two approaches to improve unsupervised QA. First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA). Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA. We conduct experiments on SQuAD 1.1, and NewsQA by fine-tuning BERT without access to manually annotated data. Our approach outperforms previous unsupervised approaches by a large margin and is competitive with early supervised models. We also show the effectiveness of our approach in the few-shot learning setting.