我们可以使用对抗检测方法来减轻后门攻击吗？

论文标题

我们可以使用对抗检测方法来减轻后门攻击吗？

Can We Mitigate Backdoor Attack Using Adversarial Detection Methods?

论文作者

Jin, Kaidi, Zhang, Tianwei, Shen, Chao, Chen, Yufei, Fan, Ming, Lin, Chenhao, Liu, Ting

论文摘要

深度神经网络众所周知，很容易受到对抗性攻击和后门攻击的影响，在这些攻击中，对输入的微小修改能够误导模型以给出错误的结果。尽管已经广泛研究了针对对抗攻击的防御措施，但有关减轻后门攻击的调查仍处于早期阶段。尚不清楚防御这两次攻击之间是否存在任何连接和共同特征。我们对对抗性示例与深神经网络的后门示例之间的联系进行了全面的研究，以寻求回答以下问题：我们可以使用对抗检测方法检测后门。我们的见解是基于这样的观察结果，即在推理过程中，对抗性示例和后门示例都有异常，与良性样本高度区分。结果，我们修改了四种现有的对抗防御方法来检测后门示例。广泛的评估表明，这些方法为反对攻击提供了可靠的保护，其准确性比检测对抗性例子更高。这些解决方案还揭示了模型灵敏度，激活空间和特征空间中对抗性示例，后门示例和正常样本的关系。这能够增强我们对这两次攻击和防御机会的固有特征的理解。

Deep Neural Networks are well known to be vulnerable to adversarial attacks and backdoor attacks, where minor modifications on the input are able to mislead the models to give wrong results. Although defenses against adversarial attacks have been widely studied, investigation on mitigating backdoor attacks is still at an early stage. It is unknown whether there are any connections and common characteristics between the defenses against these two attacks. We conduct comprehensive studies on the connections between adversarial examples and backdoor examples of Deep Neural Networks to seek to answer the question: can we detect backdoor using adversarial detection methods. Our insights are based on the observation that both adversarial examples and backdoor examples have anomalies during the inference process, highly distinguishable from benign samples. As a result, we revise four existing adversarial defense methods for detecting backdoor examples. Extensive evaluations indicate that these approaches provide reliable protection against backdoor attacks, with a higher accuracy than detecting adversarial examples. These solutions also reveal the relations of adversarial examples, backdoor examples and normal samples in model sensitivity, activation space and feature space. This is able to enhance our understanding about the inherent features of these two attacks and the defense opportunities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题