通过热图聚类检测对深神经网络的后门中毒攻击

论文标题

通过热图聚类检测对深神经网络的后门中毒攻击

Detecting Backdoor Poisoning Attacks on Deep Neural Networks by Heatmap Clustering

论文作者

Schulth, Lukas, Berghoff, Christian, Neu, Matthias

论文摘要

通过所谓的中毒攻击，神经网络的谓词可以欺骗性地改变。特殊情况是后门中毒攻击。我们研究合适的检测方法，并引入了一种称为热图聚类的新方法。在那里，我们在最先进的可解释的AI方法层相关性传播的热图上应用了$ k $ - 均值聚类算法。目的是将中毒与数据集中的无毒性数据分开。我们将此方法与类似的方法（称为激活聚类）进行了比较，该方法还使用$ k $ -MEANS聚类，但将其应用于神经网络某些隐藏层的激活作为输入。我们测试了两种方法的性能，用于标准的后门中毒攻击，标签一致的中毒攻击和贴标签的中毒攻击，并减少了标签的中毒攻击。我们表明，热图聚集始终比激活聚类表现更好。但是，在考虑符合标签的中毒攻击时，后一种方法还产生了良好的检测性能。

Predicitions made by neural networks can be fraudulently altered by so-called poisoning attacks. A special case are backdoor poisoning attacks. We study suitable detection methods and introduce a new method called Heatmap Clustering. There, we apply a $k$-means clustering algorithm on heatmaps produced by the state-of-the-art explainable AI method Layer-wise relevance propagation. The goal is to separate poisoned from un-poisoned data in the dataset. We compare this method with a similar method, called Activation Clustering, which also uses $k$-means clustering but applies it on the activation of certain hidden layers of the neural network as input. We test the performance of both approaches for standard backdoor poisoning attacks, label-consistent poisoning attacks and label-consistent poisoning attacks with reduced amplitude stickers. We show that Heatmap Clustering consistently performs better than Activation Clustering. However, when considering label-consistent poisoning attacks, the latter method also yields good detection performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题