论文标题
模型在哪里看? - 集中并解释网络注意力
Where is the Model Looking At?--Concentrate and Explain the Network Attention
论文作者
论文摘要
图像分类模型在许多数据集上取得了令人满意的性能,有时甚至比人类更好。但是,由于缺乏解释性,模型的关注尚不清楚。本文研究了模型注意力的忠诚度和解释性。我们提出了一个可解释的基于属性的多任务(EAT)框架,以将模型的注意力集中在判别图像区域上,并使注意力可以解释。我们将属性介绍到多任务学习网络中,帮助网络将注意力集中在前景对象上。我们为网络生成基于属性的文本说明,并将图像上的属性扎根以显示视觉说明。多模型解释不仅可以改善用户信任,还可以帮助找到网络和数据集的弱点。我们的框架可以推广到任何基本模型。我们在三个数据集和五个基本模型上执行实验。结果表明,饮食框架可以给出解释网络决策的多模式解释。通过指导网络关注,可以提高几种识别方法的性能。
Image classification models have achieved satisfactory performance on many datasets, sometimes even better than human. However, The model attention is unclear since the lack of interpretability. This paper investigates the fidelity and interpretability of model attention. We propose an Explainable Attribute-based Multi-task (EAT) framework to concentrate the model attention on the discriminative image area and make the attention interpretable. We introduce attributes prediction to the multi-task learning network, helping the network to concentrate attention on the foreground objects. We generate attribute-based textual explanations for the network and ground the attributes on the image to show visual explanations. The multi-model explanation can not only improve user trust but also help to find the weakness of network and dataset. Our framework can be generalized to any basic model. We perform experiments on three datasets and five basic models. Results indicate that the EAT framework can give multi-modal explanations that interpret the network decision. The performance of several recognition approaches is improved by guiding network attention.