多媒体事件提取的跨媒体结构公共空间

论文标题

多媒体事件提取的跨媒体结构公共空间

Cross-media Structured Common Space for Multimedia Event Extraction

论文作者

Li, Manling, Zareian, Alireza, Zeng, Qi, Whitehead, Spencer, Lu, Di, Ji, Heng, Chang, Shih-Fu

论文摘要

我们介绍了一项新任务，即多媒体事件提取（M2E2），该事件旨在从多媒体文档中提取事件及其论点。我们开发了第一个基准，并收集了245个多媒体新闻文章的数据集，其中包含大量注释的事件和论点。我们提出了一种新颖的方法，即弱对准结构化嵌入（WASE），该方法将语义信息的结构化表示从文本和视觉数据编码为通用嵌入空间。通过采用弱监督的培训策略，可以在跨模式中对齐结构，该策略可以利用可用的资源而无需明确的交叉媒体注释。与单模式的最新方法相比，我们的方法在文本事件参数角色标签和视觉事件提取方面可实现4.0％和9.8％的绝对F得分提升。与最先进的多媒体非结构化表示相比，我们分别在多媒体事件提取和参数角色标签上实现了8.3％和5.0％的绝对F得分提高。通过利用图像，我们提取的事件提及比传统的纯文本方法多21.4％。

We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0% and 9.8% absolute F-score gains on text event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3% and 5.0% absolute F-score gains on multimedia event extraction and argument role labeling, respectively. By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题