通过共同信息探索卷积神经网络的对抗性例子和对抗性鲁棒性

论文标题

通过共同信息探索卷积神经网络的对抗性例子和对抗性鲁棒性

Exploring Adversarial Examples and Adversarial Robustness of Convolutional Neural Networks by Mutual Information

论文作者

Zhang, Jiebao, Qian, Wenhua, Nie, Rencan, Cao, Jinde, Xu, Dan

论文摘要

卷积神经网络（CNN）的违反直觉特性是它们对对抗性示例的固有敏感性，这严重阻碍了CNN在安全至关重要的领域中的应用。对抗性示例类似于原始示例，但包含恶意扰动。对抗训练是一种简单有效的防御方法，可以提高CNN对对抗性例子的鲁棒性。对抗性实例和对抗训练的机制值得探索。因此，这项工作研究了正常训练的CNN（NT-CNN）和受过对抗训练的CNN（AT-CNN）在相互信息的角度进行信息提取的相似性和差异。我们表明1）对于原始和对抗性示例，NT-CNN还是AT-CNN是在整个培训中几乎相似的趋势； 2）与正常训练相比，对抗训练更加困难，并且从输入中提取的AT-CNN的信息量较小； 3）接受不同方法训练的CNN对某些类型的信息具有不同的偏好； NT-CNN倾向于从输入中提取基于纹理的信息，而ATCNN则更喜欢基于基于基于的信息。对抗性示例误导CNN的原因可能是它们包含有关其他类别的更多基于纹理的信息。此外，我们还分析了这项工作中使用的共同信息估计器，并发现它们概述了中层输出的几何特性。

A counter-intuitive property of convolutional neural networks (CNNs) is their inherent susceptibility to adversarial examples, which severely hinders the application of CNNs in security-critical fields. Adversarial examples are similar to original examples but contain malicious perturbations. Adversarial training is a simple and effective defense method to improve the robustness of CNNs to adversarial examples. The mechanisms behind adversarial examples and adversarial training are worth exploring. Therefore, this work investigates similarities and differences between normally trained CNNs (NT-CNNs) and adversarially trained CNNs (AT-CNNs) in information extraction from the mutual information perspective. We show that 1) whether NT-CNNs or AT-CNNs, for original and adversarial examples, the trends towards mutual information are almost similar throughout training; 2) compared with normal training, adversarial training is more difficult and the amount of information that AT-CNNs extract from the input is less; 3) the CNNs trained with different methods have different preferences for certain types of information; NT-CNNs tend to extract texture-based information from the input, while AT-CNNs prefer to shape-based information. The reason why adversarial examples mislead CNNs may be that they contain more texture-based information about other classes. Furthermore, we also analyze the mutual information estimators used in this work and find that they outline the geometric properties of the middle layer's output.

下载PDF全文

下载文献需遵守相关版权规定

论文标题