MCQA：基于多模式的共同注意网络，用于回答问题

论文标题

MCQA：基于多模式的共同注意网络，用于回答问题

MCQA: Multimodal Co-attention Based Network for Question Answering

论文作者

Kumar, Abhishek, Mittal, Trisha, Manocha, Dinesh

论文摘要

我们提出了MCQA，这是一种基于学习的算法，用于回答多模式问题。 MCQA明确融合并与多模式输入（即文本，音频和视频）保持一致，该输入形成了查询的上下文（问题和答案）。我们的方法在此上下文中融合并使问题和答案保持一致。此外，我们使用共同注意的概念来执行跨模式对齐和多模式上下文 - 标准。我们的上下文问题对齐模块与多模式上下文的相关部分以及彼此的查询相匹配，并使它们对齐以提高整体性能。我们评估了MCQA在社交IQ上的性能，这是一个用于多模式问题回答的基准数据集。我们将算法的性能与先前的方法进行比较，并观察到4-7％的准确性提高。

We present MCQA, a learning-based algorithm for multimodal question answering. MCQA explicitly fuses and aligns the multimodal input (i.e. text, audio, and video), which forms the context for the query (question and answer). Our approach fuses and aligns the question and the answer within this context. Moreover, we use the notion of co-attention to perform cross-modal alignment and multimodal context-query alignment. Our context-query alignment module matches the relevant parts of the multimodal context and the query with each other and aligns them to improve the overall performance. We evaluate the performance of MCQA on Social-IQ, a benchmark dataset for multimodal question answering. We compare the performance of our algorithm with prior methods and observe an accuracy improvement of 4-7%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题