论文标题
p $ \大约$ np,至少在视觉问题上回答
P $\approx$ NP, at least in Visual Question Answering
论文作者
论文摘要
近年来,视觉问题回答(VQA)领域的进展在很大程度上是由公共挑战和大型数据集驱动的。其中最广泛使用的是VQA 2.0数据集,由极性(“是/否”)和非极性问题组成。查看所有答案的问题分布,我们发现答案“是”和“否”占了38%的问题,而其余的62%则分布在剩余的3000多个答案上。尽管在该领域已经研究了几种偏见来源,但这种过度代表性与非极性问题的过度占用的影响尚不清楚。在本文中,我们测量了在共同使用极性和非极性样品来训练基线VQA分类器时,我们测量了潜在的混杂因素,并将其与上限进行比较,在该上限中,在训练之外排除了极性问题的过度代表。此外,我们执行跨界实验,以分析特征空间对齐的效果。与期望相反,我们在不平衡阶级的联合培训中没有发现适得其反的证据。实际上,通过探索视频嵌入的中间特征空间,我们发现极性问题的特征空间已经编码足够的结构来回答许多非极性问题。我们的结果表明极性(p)和非极性(NP)特征空间被强烈排列,因此表达式p $ \ of of of of of。
In recent years, progress in the Visual Question Answering (VQA) field has largely been driven by public challenges and large datasets. One of the most widely-used of these is the VQA 2.0 dataset, consisting of polar ("yes/no") and non-polar questions. Looking at the question distribution over all answers, we find that the answers "yes" and "no" account for 38 % of the questions, while the remaining 62% are spread over the more than 3000 remaining answers. While several sources of biases have already been investigated in the field, the effects of such an over-representation of polar vs. non-polar questions remain unclear. In this paper, we measure the potential confounding factors when polar and non-polar samples are used jointly to train a baseline VQA classifier, and compare it to an upper bound where the over-representation of polar questions is excluded from the training. Further, we perform cross-over experiments to analyze how well the feature spaces align. Contrary to expectations, we find no evidence of counterproductive effects in the joint training of unbalanced classes. In fact, by exploring the intermediate feature space of visual-text embeddings, we find that the feature space of polar questions already encodes sufficient structure to answer many non-polar questions. Our results indicate that the polar (P) and the non-polar (NP) feature spaces are strongly aligned, hence the expression P $\approx$ NP