保留一致性的视觉问题在医学成像中回答

论文标题

保留一致性的视觉问题在医学成像中回答

Consistency-preserving Visual Question Answering in Medical Imaging

论文作者

Tascon-Morales, Sergio, Márquez-Neila, Pablo, Sznitman, Raphael

论文摘要

视觉问题回答（VQA）模型以图像和自然语言问题为输入，并推断出该问题的答案。最近，由于患者参与和对临床医生的第二意见等潜在的优势，医学成像中的VQA系统已广受欢迎。尽管大多数研究工作都集中在改善体系结构和克服与数据相关的限制上，但答案一致性仍被忽略，尽管它在建立可信赖的模型中起着至关重要的作用。在这项工作中，我们提出了一个新颖的损失功能和相应的培训程序，该程序允许将问题之间的关系纳入培训过程。具体而言，我们考虑了一种认知和推理问题之间的含义是众所周知的A-Priori。为了显示我们的方法的好处，我们将其评估在眼底成像中糖尿病性黄斑水肿（DME）分期的临床相关任务上。我们的实验表明，我们的方法的表现优于最先进的基线，这不仅是提高模型一致性，而且在整体模型的准确性方面。我们的代码和数据可在https://github.com/sergiotasconmorales/consistency_vqa上找到。

Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question. Recently, VQA systems in medical imaging have gained popularity thanks to potential advantages such as patient engagement and second opinions for clinicians. While most research efforts have been focused on improving architectures and overcoming data-related limitations, answer consistency has been overlooked even though it plays a critical role in establishing trustworthy models. In this work, we propose a novel loss function and corresponding training procedure that allows the inclusion of relations between questions into the training process. Specifically, we consider the case where implications between perception and reasoning questions are known a-priori. To show the benefits of our approach, we evaluate it on the clinically relevant task of Diabetic Macular Edema (DME) staging from fundus imaging. Our experiments show that our method outperforms state-of-the-art baselines, not only by improving model consistency, but also in terms of overall model accuracy. Our code and data are available at https://github.com/sergiotasconmorales/consistency_vqa.

下载PDF全文

下载文献需遵守相关版权规定

论文标题