梅琳达：生物医学实验方法分类的多模式数据集

论文标题

梅琳达：生物医学实验方法分类的多模式数据集

MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification

论文作者

Wu, Te-Lin, Singh, Shikhar, Paul, Sayan, Burns, Gully, Peng, Nanyun

论文摘要

我们引入了一个新的数据集Melinda，以进行多模式生物医学实验方法分类。数据集以全自动的遥远监督方式收集，其中标签是从现有的策划数据库中获得的，并且实际内容是从与数据库中每个记录相关的论文中提取的。我们基准了各种最新的NLP和计算机视觉模型，包括仅将字幕文本或图像作为输入和多模式模型的单峰模型。广泛的实验和分析表明，尽管多模型的模型表现优于单峰模型，但仍需要改进，尤其是在较少监督的方式上，用语言接地视觉概念，以及更好地转移到低资源域的方式。我们释放数据集和基准，以促进多模式学习的未来研究，尤其是为了激发科学领域中应用的有针对性改进。

We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeNt methoD clAssification. The dataset is collected in a fully automated distant supervision manner, where the labels are obtained from an existing curated database, and the actual contents are extracted from papers associated with each of the records in the database. We benchmark various state-of-the-art NLP and computer vision models, including unimodal models which only take either caption texts or images as inputs, and multimodal models. Extensive experiments and analysis show that multimodal models, despite outperforming unimodal ones, still need improvements especially on a less-supervised way of grounding visual concepts with languages, and better transferability to low resource domains. We release our dataset and the benchmarks to facilitate future research in multimodal learning, especially to motivate targeted improvements for applications in scientific domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题