MEDFILTER：通过整合话语结构和本体论知识，改善与任务相关的话语提取的话语

论文标题

MEDFILTER：通过整合话语结构和本体论知识，改善与任务相关的话语提取的话语

MedFilter: Improving Extraction of Task-relevant Utterances from Doctor-Patient Conversations through Integration of Discourse Structure and Ontological Knowledge

论文作者

Khosla, Sopan, Vashishth, Shikhar, Lehman, Jill Fain, Rose, Carolyn

论文摘要

从对话数据中提取信息特别具有挑战性，因为以任务为中心的对话的性质可以有效地传达人类隐式信息，但对机器来说是具有挑战性的。话语之间的挑战可能会有所不同，具体取决于说话者在对话中的作用，尤其是当相关专业知识跨角色不对称时。此外，随着对话中隐含地传达的信息构建更多共享的上下文，挑战也可能会增加。在本文中，我们提出了新颖的建模方法MedFilter，该方法解决了这些见解，以提高识别和分类与任务相关的话语时的性能，并在这样做时对下游信息提取任务的性能产生积极影响。我们在近7,000次医生对话的语料库上评估了这种方法，其中使用MedFilter来识别与讨论的医学相关贡献（在PR曲线下的面积比SOTA基线相比，提高了10％的贡献）。确定与任务相关的话语有益于下游医疗处理，在提取症状，药物和投诉的提取方面分别提高了15％，105％和23％。

Information extraction from conversational data is particularly challenging because the task-centric nature of conversation allows for effective communication of implicit information by humans, but is challenging for machines. The challenges may differ between utterances depending on the role of the speaker within the conversation, especially when relevant expertise is distributed asymmetrically across roles. Further, the challenges may also increase over the conversation as more shared context is built up through information communicated implicitly earlier in the dialogue. In this paper, we propose the novel modeling approach MedFilter, which addresses these insights in order to increase performance at identifying and categorizing task-relevant utterances, and in so doing, positively impacts performance at a downstream information extraction task. We evaluate this approach on a corpus of nearly 7,000 doctor-patient conversations where MedFilter is used to identify medically relevant contributions to the discussion (achieving a 10% improvement over SOTA baselines in terms of area under the PR curve). Identifying task-relevant utterances benefits downstream medical processing, achieving improvements of 15%, 105%, and 23% respectively for the extraction of symptoms, medications, and complaints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题