论文标题

Ernie-Doc:回顾性的长期模型变压器

ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

论文作者

Ding, Siyu, Shang, Junyuan, Wang, Shuohuan, Sun, Yu, Tian, Hao, Wu, Hua, Wang, Haifeng

论文摘要

由于变压器的记忆和时间消耗倍增,变形金刚不适合处理长文档。只需截断长文档或应用稀疏注意机制就会引起上下文碎片问题或导致与可比模型大小相对的劣质建模能力。在本文中,我们提出了Ernie-Doc,这是一种基于复发变压器的文档级语言预处理模型。两种精心设计的技术,即回顾性饲料机制和增强的复发机制,使Ernie-Doc具有更长的有效上下文长度,以捕获完整文档的上下文信息。我们以Ernie-Doc的方式预先介绍,以明确地了解细分市场之间的关系,并具有附加的文档感知的细分序列目标。对英语和中文文档级任务进行了各种实验。 Ernie-Doc在Wikitext-103上将困惑性的最新语言建模结果提高到16.8。此外,它在大多数语言理解任务(例如文本分类和问答回答)上的差距大大优于竞争预读模型。

Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-Doc, a document-level language pretraining model based on Recurrence Transformers. Two well-designed techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism, enable ERNIE-Doc, which has a much longer effective context length, to capture the contextual information of a complete document. We pretrain ERNIE-Doc to explicitly learn the relationships among segments with an additional document-aware segment-reordering objective. Various experiments were conducted on both English and Chinese document-level tasks. ERNIE-Doc improved the state-of-the-art language modeling result of perplexity to 16.8 on WikiText-103. Moreover, it outperformed competitive pretraining models by a large margin on most language understanding tasks, such as text classification and question answering.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源