深部结构：结构预测的语言模型进行预处理

论文标题

深部结构：结构预测的语言模型进行预处理

DeepStruct: Pretraining of Language Models for Structure Prediction

论文作者

Wang, Chenguang, Liu, Xiao, Chen, Zui, Hong, Haoyun, Tang, Jie, Song, Dawn

论文摘要

我们介绍了一种改善语言模型的结构理解能力的方法。与以前使用特定于任务的增强模型对模型进行修补的方法不同，我们将语言模型预先介绍了一组任务不合时宜的语料库，以从文本中生成结构。我们的结构预告可以使模型对结构任务具有的学习知识的零射传递。我们研究了该方法在28个数据集上的性能，涵盖了10个结构预测任务，包括开放信息提取，联合实体和关系提取，命名实体识别，关系分类，语义角色标签，事件提取，核心解决方案，事实探查，意图检测和对话状态跟踪。我们通过特定于任务的培训组进一步增强了预处理。我们表明，一个10b参数语言模型会非凡转移到大多数任务，并在我们评估的28个数据集中的21个中获得最先进的性能。

We introduce a method for improving the structural understanding abilities of language models. Unlike previous approaches that finetune the models with task-specific augmentation, we pretrain language models on a collection of task-agnostic corpora to generate structures from text. Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks. We study the performance of this approach on 28 datasets, spanning 10 structure prediction tasks including open information extraction, joint entity and relation extraction, named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, factual probe, intent detection, and dialogue state tracking. We further enhance the pretraining with the task-specific training sets. We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets that we evaluate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题