神经网络的强大解释约束

论文标题

神经网络的强大解释约束

Robust Explanation Constraints for Neural Networks

论文作者

Wicker, Matthew, Heo, Juyeon, Costabello, Luca, Weller, Adrian

论文摘要

事后解释方法的使用是为了提供有关神经网络的见解，有时据说有助于建立信任。但是，已经发现流行的解释方法对输入特征或模型参数的少量扰动脆弱。依赖于非凸优化的约束松弛技术，我们开发了一种方法，该方法是对手可以通过对输入特征或模型参数的有界操作来对基于梯度进行解释的最大变化。通过将紧凑的输入或参数集传播为符号间隔，通过对神经网络的正向和向后计算，我们可以正式证明基于梯度的解释的鲁棒性。我们的界限是可区分的，因此我们可以将可证明的解释鲁棒性纳入神经网络培训中。从经验上讲，我们的方法超过了以前的启发式方法所提供的鲁棒性。我们发现，我们的培训方法是唯一能够在所有六个测试的数据集中学习具有解释证书的神经网络的方法。

Post-hoc explanation methods are used with the intent of providing insights about neural networks and are sometimes said to help engender trust in their outputs. However, popular explanations methods have been found to be fragile to minor perturbations of input features or model parameters. Relying on constraint relaxation techniques from non-convex optimization, we develop a method that upper-bounds the largest change an adversary can make to a gradient-based explanation via bounded manipulation of either the input features or model parameters. By propagating a compact input or parameter set as symbolic intervals through the forwards and backwards computations of the neural network we can formally certify the robustness of gradient-based explanations. Our bounds are differentiable, hence we can incorporate provable explanation robustness into neural network training. Empirically, our method surpasses the robustness provided by previous heuristic approaches. We find that our training method is the only method able to learn neural networks with certificates of explanation robustness across all six datasets tested.

下载PDF全文

下载文献需遵守相关版权规定

论文标题