基于依赖关系的混合语言模型

论文标题

基于依赖关系的混合语言模型

Dependency-based Mixture Language Models

论文作者

Yang, Zhixian, Wan, Xiaojun

论文摘要

已经提出了各种模型将句法结构的知识纳入神经语言模型。但是，以前的作品在很大程度上依赖于精心设计的组件（通常是经常性的神经网络（RNN）），这使自己在实践中变得笨拙，可以适应其他神经语言模型，例如变形金刚和GPT-2。在本文中，我们介绍了基于依赖关系的混合语言模型。详细说明，我们首先以一种新颖的依赖性建模目标来训练神经语言模型，以了解给定上下文的未来依赖令牌的概率分布。然后，我们通过将先前的依赖性建模概率分布与自我注意力相混合来制定下一句话的概率。广泛的实验和人类评估表明，我们的方法可以轻松有效地应用于不同的神经语言模型，同时改善各种任务的神经文本生成。

Various models have been proposed to incorporate knowledge of syntactic structures into neural language models. However, previous works have relied heavily on elaborate components for a specific language model, usually recurrent neural network (RNN), which makes themselves unwieldy in practice to fit into other neural language models, such as Transformer and GPT-2. In this paper, we introduce the Dependency-based Mixture Language Models. In detail, we first train neural language models with a novel dependency modeling objective to learn the probability distribution of future dependent tokens given context. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention. Extensive experiments and human evaluations show that our method can be easily and effectively applied to different neural language models while improving neural text generation on various tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题