一个欺骗他们所有人的神经元

论文标题

一个欺骗他们所有人的神经元

One Neuron to Fool Them All

论文作者

Suri, Anshuman, Evans, David

论文摘要

尽管在对抗性例子中进行了大量研究，但对模型敏感性的根本原因尚未得到充分理解。我们提出了一个概念，而不是研究攻击特异性的鲁棒性，它可以评估单个神经元的敏感性，从模型的输出进行稳健性来引导该神经元输出的扰动。从这个角度分析模型揭示了标准和对抗训练的鲁棒模型的独特特征，并带来了一些奇怪的结果。在我们对CIFAR-10和Imagenet的实验中，我们发现使用仅针对单个敏感神经元的损失函数的攻击几乎与针对完整模型的攻击示例一样有效。我们分析了这些敏感神经元的特性，以提出一个正则化项，该项可以帮助模型实现对各种不同扰动约束的鲁棒性，同时保持自然数据分布的准确性。我们所有实验的代码均可在https://github.com/iamgroot42/sauron上获得。

Despite vast research in adversarial examples, the root causes of model susceptibility are not well understood. Instead of looking at attack-specific robustness, we propose a notion that evaluates the sensitivity of individual neurons in terms of how robust the model's output is to direct perturbations of that neuron's output. Analyzing models from this perspective reveals distinctive characteristics of standard as well as adversarially-trained robust models, and leads to several curious results. In our experiments on CIFAR-10 and ImageNet, we find that attacks using a loss function that targets just a single sensitive neuron find adversarial examples nearly as effectively as ones that target the full model. We analyze the properties of these sensitive neurons to propose a regularization term that can help a model achieve robustness to a variety of different perturbation constraints while maintaining accuracy on natural data distributions. Code for all our experiments is available at https://github.com/iamgroot42/sauron .

下载PDF全文

下载文献需遵守相关版权规定

论文标题