Megatron-CNTRL：使用大规模语言模型的外部知识的可控故事产生

论文标题

Megatron-CNTRL：使用大规模语言模型的外部知识的可控故事产生

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

论文作者

Xu, Peng, Patwary, Mostofa, Shoeybi, Mohammad, Puri, Raul, Fung, Pascale, Anandkumar, Anima, Catanzaro, Bryan

论文摘要

现有的预训练的大型语言模型已显示出无与伦比的生成能力。但是，它们是不可控制的。在本文中，我们提出了Megatron-CNTRL，这是一个使用大型语言模型的新型框架，并通过合并外部知识基础为文本生成增添了控制。我们的框架包括一个关键字预测指标，知识回收者，上下文知识排名者和有条件的文本生成器。由于我们无法访问知识排名者的地面监督，因此我们利用句子嵌入的弱监督。经验结果表明，与ROC故事数据集的先前工作相比，我们的模型具有更少的重复和更高多样性的流利，一致和连贯的故事。我们通过替换用于生成故事并重新运行生成过程的关键字来展示模型的可控性。人类评估结果表明，这些故事中有77.5％由新关键字成功控制。此外，通过将我们的模型从1.24亿个缩放到83亿个参数，我们证明了较大的模型提高了一致性的发电质量（从74.5％到93.0％）和可控性（从77.5％到91.5％）。

Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).

下载PDF全文

下载文献需遵守相关版权规定

论文标题