你甚至能从右边告诉左吗？为VQA提出新的挑战

论文标题

你甚至能从右边告诉左吗？为VQA提出新的挑战

Can you even tell left from right? Presenting a new challenge for VQA

论文作者

Venkatraman, Sai Raam, Rao, Rishi, Balasubramanian, S., Vorugunti, Chandra Sekhar, Sarma, R. Raghunatha

论文摘要

视觉问题回答（VQA）需要一种评估模型的优势和劣势的方法。这种评估的一个方面是评估组成概括，或模型在场景设定与训练集不同的场景上回答良好的能力。因此，为此，我们需要火车和测试集在组成方面有很大差异的数据集。在这项工作中，我们提出了几种构图分离的定量度量，发现VQA的流行数据集不是良好的评估者。为了解决这个问题，我们以看不见的配置（UOUC）（VQA的合成数据集）中介绍了不常见的对象。 UOUC立刻相当复杂，同时也可以很好地分离。 UOUC的对象类包括来自地牢和龙游戏中528个字符的380个clasess。 UOUC的火车套装由200,000个场景组成；而测试集则包括30,000个场景。为了研究构图概括，简单的推理和记忆，UOUC的每个场景都有多达10个新颖的问题。这些涉及空间关系，场景的假设变化，计数，比较，记忆和基于内存的推理。 UOC总共提出了超过200万个问题。 UOUC还发现自己是VQA表现出色的模型的巨大挑战。我们对VQA最新模型的评估表明，组成概括不良，而简单推理的能力相对较低。这些结果表明，UOU可以通过成为VQA的强大基准，从而导致研究的进步。

Visual Question Answering (VQA) needs a means of evaluating the strengths and weaknesses of models. One aspect of such an evaluation is the evaluation of compositional generalisation, or the ability of a model to answer well on scenes whose scene-setups are different from the training set. Therefore, for this purpose, we need datasets whose train and test sets differ significantly in composition. In this work, we present several quantitative measures of compositional separation and find that popular datasets for VQA are not good evaluators. To solve this, we present Uncommon Objects in Unseen Configurations (UOUC), a synthetic dataset for VQA. UOUC is at once fairly complex while also being well-separated, compositionally. The object-class of UOUC consists of 380 clasess taken from 528 characters from the Dungeons and Dragons game. The train set of UOUC consists of 200,000 scenes; whereas the test set consists of 30,000 scenes. In order to study compositional generalisation, simple reasoning and memorisation, each scene of UOUC is annotated with up to 10 novel questions. These deal with spatial relationships, hypothetical changes to scenes, counting, comparison, memorisation and memory-based reasoning. In total, UOUC presents over 2 million questions. UOUC also finds itself as a strong challenge to well-performing models for VQA. Our evaluation of recent models for VQA shows poor compositional generalisation, and comparatively lower ability towards simple reasoning. These results suggest that UOUC could lead to advances in research by being a strong benchmark for VQA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题