上下文如何影响语言模型的事实预测

论文标题

上下文如何影响语言模型的事实预测

How Context Affects Language Models' Factual Predictions

论文作者

Petroni, Fabio, Lewis, Patrick, Piktus, Aleksandra, Rocktäschel, Tim, Wu, Yuxiang, Miller, Alexander H., Riedel, Sebastian

论文摘要

当对大型无监督文本语料库进行预培训时，语言模型能够在某种程度上存储和检索事实知识，从而可以直接将它们直接用于零摄像的紧缩风格的问题回答。但是，将事实知识存储在语言模型的固定权重中显然有局限性。先前的方法已使用有监督的体系结构将信息检索系统与机器阅读组件结合在一起，成功地提供了对模型权重之外的信息的访问。在本文中，我们迈出了进一步的一步，并以纯粹无监督的方式将检索系统中的信息与预训练的语言模型集成在一起。我们报告说，以这种方式增强预训练的语言模型可以显着提高性能，并且最终的系统与监督的机器阅读基线具有竞争力。此外，使用不同的片段令牌处理查询和上下文使BERT可以利用其下一个句子预测预训练的分类器来确定上下文是否相关，从而实质上改善了Bert的零透明式悬而未决的问题求职绩效，并使其预测强大地对嘈杂的上下文。

When pre-trained on large unsupervised textual corpora, language models are able to store and retrieve factual knowledge to some extent, making it possible to use them directly for zero-shot cloze-style question answering. However, storing factual knowledge in a fixed number of weights of a language model clearly has limitations. Previous approaches have successfully provided access to information outside the model weights using supervised architectures that combine an information retrieval system with a machine reading component. In this paper, we go a step further and integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way. We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline. Furthermore, processing query and context with different segment tokens allows BERT to utilize its Next Sentence Prediction pre-trained classifier to determine whether the context is relevant or not, substantially improving BERT's zero-shot cloze-style question-answering performance and making its predictions robust to noisy contexts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题