通过随机扰动差异私人对抗性鲁棒性

论文标题

通过随机扰动差异私人对抗性鲁棒性

Differentially Private Adversarial Robustness Through Randomized Perturbations

论文作者

Xu, Nan, Feyisetan, Oluwaseyi, Aggarwal, Abhinav, Xu, Zekun, Teissier, Nathanael

论文摘要

深层神经网络尽管在不同领域取得了巨大的成功，但事实证明，对正确分类的示例的小扰动敏感并导致错误的预测。最近，有人提出，可以通过优化训练示例的所有可能替代的最坏情况损失功能来对抗这种行为。但是，这可能容易权衡不太可能的替换，从而限制了准确性的增长。在本文中，我们通过随机扰动来研究对抗性鲁棒性，这具有两个直接的优势：（1）确保替代可能性通过与原始单词的接近度所加权，我们规避优化最坏情况的保证并实现绩效增长；（2）校准的随机性赋予了差异性模型训练，从而提高了对模型输出的对抗性攻击的鲁棒性。我们的方法使用基于截短的牙龈噪声的新型基于密度的机制，该机制可确保对词汇中稀有单词和密集单词的替换，同时保持模型鲁棒性的语义相似性。

Deep Neural Networks, despite their great success in diverse domains, are provably sensitive to small perturbations on correctly classified examples and lead to erroneous predictions. Recently, it was proposed that this behavior can be combatted by optimizing the worst case loss function over all possible substitutions of training examples. However, this can be prone to weighing unlikely substitutions higher, limiting the accuracy gain. In this paper, we study adversarial robustness through randomized perturbations, which has two immediate advantages: (1) by ensuring that substitution likelihood is weighted by the proximity to the original word, we circumvent optimizing the worst case guarantees and achieve performance gains; and (2) the calibrated randomness imparts differentially-private model training, which additionally improves robustness against adversarial attacks on the model outputs. Our approach uses a novel density-based mechanism based on truncated Gumbel noise, which ensures training on substitutions of both rare and dense words in the vocabulary while maintaining semantic similarity for model robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题