通过归因驱动的见解探索VQA模型的弱点

论文标题

通过归因驱动的见解探索VQA模型的弱点

Exploring Weaknesses of VQA Models through Attribution Driven Insights

论文作者

Halbe, Shaunak

论文摘要

由于相关大规模数据集的可用性，在过去几年中，深层神经网络已成功地用于视觉问题回答的任务。但是，这些数据集是在人工设置中创建的，很少反映现实世界的情况。最近的研究有效地采用了这些VQA模型来回答盲人的视觉问题。尽管达到了很高的精度，但这些模型似乎容易受到输入问题的变化。我们通过归因镜头（输入对预测的影响）分析流行的VQA模型，以获得宝贵的见解。此外，我们使用这些见解来制作对抗性攻击，这些攻击对这些系统造成了重大损害，而输入问题的含义可以忽略不计。我们认为，这将增强系统的开发，以更强大地对部署以帮助视力障碍的输入的可能变化。

Deep Neural Networks have been successfully used for the task of Visual Question Answering for the past few years owing to the availability of relevant large scale datasets. However these datasets are created in artificial settings and rarely reflect the real world scenario. Recent research effectively applies these VQA models for answering visual questions for the blind. Despite achieving high accuracy these models appear to be susceptible to variation in input questions.We analyze popular VQA models through the lens of attribution (input's influence on predictions) to gain valuable insights. Further, We use these insights to craft adversarial attacks which inflict significant damage to these systems with negligible change in meaning of the input questions. We believe this will enhance development of systems more robust to the possible variations in inputs when deployed to assist the visually impaired.

下载PDF全文

下载文献需遵守相关版权规定

论文标题