论文标题
多语言神经第一话语解析
Multilingual Neural RST Discourse Parsing
论文作者
论文摘要
文本话语解析在理解自然语言中的信息流和论证结构中起着重要作用。在修辞结构理论(RST)下的先前研究主要集中于诱导和评估英国树库的模型。但是,由于带注释的数据短缺,其他语言(例如德语,荷兰语和葡萄牙语)的解析任务仍然具有挑战性。在这项工作中,我们通过以下方式研究了两种建立神经,跨语性话语解析器的方法:(1)利用多语言矢量表示; (2)采用源内容的细分级翻译。实验结果表明,即使培训数据有限,这两种方法都是有效的,并且在所有子任务上都在跨语言,文档级话语解析方面实现最先进的绩效。
Text discourse parsing plays an important role in understanding information flow and argumentative structure in natural language. Previous research under the Rhetorical Structure Theory (RST) has mostly focused on inducing and evaluating models from the English treebank. However, the parsing tasks for other languages such as German, Dutch, and Portuguese are still challenging due to the shortage of annotated data. In this work, we investigate two approaches to establish a neural, cross-lingual discourse parser via: (1) utilizing multilingual vector representations; and (2) adopting segment-level translation of the source content. Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing on all sub-tasks.