使用基于新功能的异常检测来检测神经网络中的后门

论文标题

使用基于新功能的异常检测来检测神经网络中的后门

Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

论文作者

Fu, Hao, Veldanda, Akshaj Kumar, Krishnamurthy, Prashanth, Garg, Siddharth, Khorrami, Farshad

论文摘要

本文提出了针对神经网络后卫攻击的新辩护，在攻击者选择的触发器在场的情况下，经过恶意训练了对错误预测的训练。我们的辩护是基于直觉的，即后门网络的特征提取层嵌入了新功能，以检测触发器的存在，并且随后的分类层学会在检测到触发器时学会错误地预测。因此，为了检测后门，拟议的防御使用了两个经过清洁验证数据训练的协同异常检测器：第一个是新颖的检测器，可以检查异常特征，而第二个则检测到从功能到输出的异常映射，通过与经过单独的分类器对验证数据进行比较。该方法是在成功逃避最新防御措施的各种背式网络（具有多种触发因素的多种变化）上进行评估的。此外，我们评估了方法在不可察觉的扰动，大规模数据集的可伸缩性以及域移位下的有效性方面的鲁棒性。本文还表明，可以使用数据扩展进一步改善辩护。

This paper proposes a new defense against neural network backdooring attacks that are maliciously trained to mispredict in the presence of attacker-chosen triggers. Our defense is based on the intuition that the feature extraction layers of a backdoored network embed new features to detect the presence of a trigger and the subsequent classification layers learn to mispredict when triggers are detected. Therefore, to detect backdoors, the proposed defense uses two synergistic anomaly detectors trained on clean validation data: the first is a novelty detector that checks for anomalous features, while the second detects anomalous mappings from features to outputs by comparing with a separate classifier trained on validation data. The approach is evaluated on a wide range of backdoored networks (with multiple variations of triggers) that successfully evade state-of-the-art defenses. Additionally, we evaluate the robustness of our approach on imperceptible perturbations, scalability on large-scale datasets, and effectiveness under domain shift. This paper also shows that the defense can be further improved using data augmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题