自我融合的对抗训练，以提高鲁棒性

论文标题

自我融合的对抗训练，以提高鲁棒性

Self-Ensemble Adversarial Training for Improved Robustness

论文作者

Wang, Hongjun, Wang, Yisen

论文摘要

由于机器智能带来的现实应用程序中的众多突破，深层神经网络（DNN）被广泛用于关键应用中。但是，对DNN的预测很容易被不可察觉的对抗扰动来操纵，这阻碍了DNN的进一步部署，并可能导致深远的安全和隐私影响。通过将对抗性样本纳入训练数据库中，对抗训练是各种防御方法中各种对抗性攻击的最强原则性策略。最近的作品主要集中于开发新的损失功能或正规机构，试图在体重空间中找到独特的最佳点。但是，没有一个可以利用从标准对抗训练中获得的分类器的潜力，尤其是有关培训轨迹的状态。在这项工作中，我们通过训练过程致力于模型的重量状态，并设计了一种简单但功能强大的\ emph {自我启动的对抗训练}（seat）方法，方法是通过平均历史模型来产生强大的分类器。这大大提高了目标模型对几次众所周知的对抗性攻击的鲁棒性，甚至仅利用幼稚的跨透明拷贝损失进行监督。我们还讨论了来自不同对手训练的模型的预测集合与权重浓度模型的预测之间的关系，并提供了理论和经验证据，即提出的自我构成方法比单个模型以及来自不同分类器的预测组合提供了更顺利的损失景观和更好的稳健性。我们进一步分析了自我气氛模型的一般环境中一个微妙但致命的问题，这会导致后期重量增强方法的恶化。

Due to numerous breakthroughs in real-world applications brought by machine intelligence, deep neural networks (DNNs) are widely employed in critical applications. However, predictions of DNNs are easily manipulated with imperceptible adversarial perturbations, which impedes the further deployment of DNNs and may result in profound security and privacy implications. By incorporating adversarial samples into the training data pool, adversarial training is the strongest principled strategy against various adversarial attacks among all sorts of defense methods. Recent works mainly focus on developing new loss functions or regularizers, attempting to find the unique optimal point in the weight space. But none of them taps the potentials of classifiers obtained from standard adversarial training, especially states on the searching trajectory of training. In this work, we are dedicated to the weight states of models through the training process and devise a simple but powerful \emph{Self-Ensemble Adversarial Training} (SEAT) method for yielding a robust classifier by averaging weights of history models. This considerably improves the robustness of the target model against several well known adversarial attacks, even merely utilizing the naive cross-entropy loss to supervise. We also discuss the relationship between the ensemble of predictions from different adversarially trained models and the prediction of weight-ensembled models, as well as provide theoretical and empirical evidence that the proposed self-ensemble method provides a smoother loss landscape and better robustness than both individual models and the ensemble of predictions from different classifiers. We further analyze a subtle but fatal issue in the general settings for the self-ensemble model, which causes the deterioration of the weight-ensembled method in the late phases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题