关于神经网络可解释性的调查

论文标题

关于神经网络可解释性的调查

A Survey on Neural Network Interpretability

论文作者

Zhang, Yu, Tiňo, Peter, Leonardis, Aleš, Tang, Ke

论文摘要

除了深层神经网络的巨大成功之外，对它们的黑盒本性也越来越关注。可解释性问题会影响人们对深度学习系统的信任。它也与许多道德问题有关，例如算法歧视。此外，可解释性是深层网络成为其他研究领域强大工具的期望属性，例如药物发现和基因组学。在这项调查中，我们对神经网络可解释性研究进行了全面综述。我们首先阐明了可解释性的定义，因为它已在许多不同的情况下使用。然后，我们详细阐述了可解释性的重要性，并提出了沿三个维度组织的新型分类学：参与类型（被动与主动解释方法），解释的类型和重点（从本地解释性到全球解释性）。该分类法提供了有意义的3D观点，即从相关文献中分发论文，因为其中两个维度不仅是分类的，而且允许顺序的子类别。最后，我们总结了现有的可解释性评估方法，并提出了受我们新分类法启发的可能的研究方向。

Along with the great success of deep neural networks, there is also growing concern about their black-box nature. The interpretability issue affects people's trust on deep learning systems. It is also related to many ethical problems, e.g., algorithmic discrimination. Moreover, interpretability is a desired property for deep networks to become powerful tools in other research fields, e.g., drug discovery and genomics. In this survey, we conduct a comprehensive review of the neural network interpretability research. We first clarify the definition of interpretability as it has been used in many different contexts. Then we elaborate on the importance of interpretability and propose a novel taxonomy organized along three dimensions: type of engagement (passive vs. active interpretation approaches), the type of explanation, and the focus (from local to global interpretability). This taxonomy provides a meaningful 3D view of distribution of papers from the relevant literature as two of the dimensions are not simply categorical but allow ordinal subcategories. Finally, we summarize the existing interpretability evaluation methods and suggest possible research directions inspired by our new taxonomy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题