论文标题

Visfis:视觉特征的重要性监督与右转的目标有关

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

论文作者

Ying, Zhuofan, Hase, Peter, Bansal, Mohit

论文摘要

许多过去的作品旨在通过监督特征重要性(通过模型解释技术估算)通过人类注释(例如重要图像区域的亮点)来改善模型中的视觉推理。但是,最近的工作表明,即使在随机的监督下,对视觉问题答案(VQA)任务的特征重要性(FI)监督的绩效提高,这表明这些方法不会有意义地将模型FI与人类FI保持一致。在本文中,我们表明,模型FI监督可以有意义地提高VQA模型的准确性,并通过优化四个关键模型目标来提高几个正确的右季节(RRR)指标的性能:(1)给出有限但足够的信息(充分效率)的准确预测; (2)没有重要信息(不确定性)的最大 - 凝集预测; (3)预测不重要的特征变化(不变性)的不变性; (4)模型FI解释与人类FI解释(合理性)之间的对齐。我们最佳性能的方法,视觉功能重要性监督(Visfis),就分布和分布的精度而言,在基准VQA数据集上优于基准VQA数据集的强大基准。尽管过去的工作表明,提高准确性的机制是通过提高解释的合理性,但我们表明这种关系取决于忠实的解释(解释是否真的代表了模型的内部推理)。当解释是合理的和忠实的,而不是当它们是合理但不是忠实的时,预测会更准确。最后,我们表明,令人惊讶的是,在控制模型的分布准确性时,RRR指标不能预测分布模型的准确性,这使这些指标值质疑评估模型推理的价值。所有支持代码均可在https://github.com/zfying/visfis上找到

Many past works aim to improve visual reasoning in models by supervising feature importance (estimated by model explanation techniques) with human annotations such as highlights of important image regions. However, recent work has shown that performance gains from feature importance (FI) supervision for Visual Question Answering (VQA) tasks persist even with random supervision, suggesting that these methods do not meaningfully align model FI with human FI. In this paper, we show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason (RRR) metrics by optimizing for four key model objectives: (1) accurate predictions given limited but sufficient information (Sufficiency); (2) max-entropy predictions given no important information (Uncertainty); (3) invariance of predictions to changes in unimportant features (Invariance); and (4) alignment between model FI explanations and human FI explanations (Plausibility). Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets in terms of both in-distribution and out-of-distribution accuracy. While past work suggests that the mechanism for improved accuracy is through improved explanation plausibility, we show that this relationship depends crucially on explanation faithfulness (whether explanations truly represent the model's internal reasoning). Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful. Lastly, we show that, surprisingly, RRR metrics are not predictive of out-of-distribution model accuracy when controlling for a model's in-distribution accuracy, which calls into question the value of these metrics for evaluating model reasoning. All supporting code is available at https://github.com/zfying/visfis

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源