调查图表到文本的验证语言模型

论文标题

调查图表到文本的验证语言模型

Investigating Pretrained Language Models for Graph-to-Text Generation

论文作者

Ribeiro, Leonardo F. R., Schmitt, Martin, Schütze, Hinrich, Gurevych, Iryna

论文摘要

图表到文本生成旨在从基于图的数据中生成流利的文本。在本文中，我们研究了最近提出的两个预审议的语言模型（PLM），并分析了不同任务自适应训练策略对PLM在图形生成中的影响。我们介绍了三个图领域的研究：含义表示，维基百科知识图（kgs）和科学kg。我们表明，PLMS BART和T5取得了新的最先进的结果，并且任务自适应的预训练策略进一步提高了其绩效。特别是，我们在LDC2017T10上报告了49.72的新最先进的BLEU分数，在WebNLG上为59.70，议程数据集上的25.66分别为25.66-相对提高了31.8％，4.5％和42.4％。在广泛的分析中，我们确定了PLM在图形任务上成功的可能原因。我们发现证据表明，即使将输入图表示形式简化为简单的节点和边缘标签，他们对真实事实的知识也可以帮助他们表现良好。

Graph-to-text generation aims to generate fluent texts from graph-based data. In this paper, we investigate two recently proposed pretrained language models (PLMs) and analyze the impact of different task-adaptive pretraining strategies for PLMs in graph-to-text generation. We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs. We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further. In particular, we report new state-of-the-art BLEU scores of 49.72 on LDC2017T10, 59.70 on WebNLG, and 25.66 on AGENDA datasets - a relative improvement of 31.8%, 4.5%, and 42.4%, respectively. In an extensive analysis, we identify possible reasons for the PLMs' success on graph-to-text tasks. We find evidence that their knowledge about true facts helps them perform well even when the input graph representation is reduced to a simple bag of node and edge labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题