论文标题
检索增强的视觉问题与外部知识回答
Retrieval Augmented Visual Question Answering with Outside Knowledge
论文作者
论文摘要
外部知识视觉问题回答(OK-VQA)是一项具有挑战性的VQA任务,需要检索外部知识才能回答有关图像的问题。最近的OK-VQA系统使用密集的通道检索(DPR)从Wikipedia等外部知识库中检索文档,但是随着DPR与答案分开培训,引入了对整体系统性能的潜在限制。取而代之的是,我们提出了一个联合培训计划,其中包括与答案生成集成的可区分DPR,以便可以以端到端的方式对系统进行培训。我们的实验表明,我们的方案的表现优于最近的OK-VQA系统,具有强大的DPR用于检索。我们还引入了新的诊断指标,以分析检索和产生的相互作用。我们模型的强大检索能力大大减少了培训所需的检索文档数量,从而在培训所需的答案质量和计算中获得了重大好处。
Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQA task that requires retrieval of external knowledge to answer questions about images. Recent OK-VQA systems use Dense Passage Retrieval (DPR) to retrieve documents from external knowledge bases, such as Wikipedia, but with DPR trained separately from answer generation, introducing a potential limit on the overall system performance. Instead, we propose a joint training scheme which includes differentiable DPR integrated with answer generation so that the system can be trained in an end-to-end fashion. Our experiments show that our scheme outperforms recent OK-VQA systems with strong DPR for retrieval. We also introduce new diagnostic metrics to analyze how retrieval and generation interact. The strong retrieval ability of our model significantly reduces the number of retrieved documents needed in training, yielding significant benefits in answer quality and computation required for training.