对隐性话语关系识别的调查

论文标题

对隐性话语关系识别的调查

A Survey of Implicit Discourse Relation Recognition

论文作者

Xiang, Wei, Wang, Bang

论文摘要

包含一个或多个句子的话语描述了人们传达他们的思想和观点的日常问题和事件。由于句子通常由多个文本段组成，因此对话语主题的正确理解应考虑文本段之间的关系。尽管有时在用于传达关系的原始文本中存在一个结缔组织，但更常见的情况是，在两个文本段之间不存在任何结缔组织，但它们之间确实存在某种隐式关系。隐式话语关系识别（IDRR）的任务是检测隐式关系，并在没有结缔组织的两个文本段之间对其有意义进行分类。实际上，IDRR任务对于不同的下游自然语言处理任务很重要，例如文本摘要，机器翻译等。本文为IDRR任务提供了全面，最新的调查。我们首先总结了该字段中广泛使用的任务定义和数据源。我们从其开发历史的角度将IDRR任务的主要解决方案方法分类。在每个解决方案类别中，我们介绍和分析最具代表性的方法，包括它们的起源，思想，优势和劣势。我们还对在公共语料库进行标准数据处理程序进行了实验的解决方案进行了比较。最后，我们讨论了话语关系分析的未来研究方向。

A discourse containing one or more sentences describes daily issues and events for people to communicate their thoughts and opinions. As sentences are normally consist of multiple text segments, correct understanding of the theme of a discourse should take into consideration of the relations in between text segments. Although sometimes a connective exists in raw texts for conveying relations, it is more often the cases that no connective exists in between two text segments but some implicit relation does exist in between them. The task of implicit discourse relation recognition (IDRR) is to detect implicit relation and classify its sense between two text segments without a connective. Indeed, the IDRR task is important to diverse downstream natural language processing tasks, such as text summarization, machine translation and so on. This article provides a comprehensive and up-to-date survey for the IDRR task. We first summarize the task definition and data sources widely used in the field. We categorize the main solution approaches for the IDRR task from the viewpoint of its development history. In each solution category, we present and analyze the most representative methods, including their origins, ideas, strengths and weaknesses. We also present performance comparisons for those solutions experimented on a public corpus with standard data processing procedures. Finally, we discuss future research directions for discourse relation analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题