混乱理论和对抗性的鲁棒性

论文标题

混乱理论和对抗性的鲁棒性

Chaos Theory and Adversarial Robustness

论文作者

Kent, Jonathan S.

论文摘要

神经网络容易受到对抗性攻击，在部署在关键或对抗应用程序中之前，应面临严格的审查。本文使用混乱理论中的思想来解释，分析和量化神经网络容易受到对抗攻击的影响或强大的程度。为此，我们提出了一个新的度量标准，即$ \ hatψ（h，θ）$给出的“易感性比”，该指标捕获了模型的输出将通过对给定输入的扰动来更改模型的输出。我们的结果表明，随着模型的深度，攻击的易感性显着增长，这对生产环境的神经网络设计具有安全性。我们提供了$ \ hatψ$与分类模型的攻击后准确性之间的关系的实验证据，以及对其在缺乏艰难决策界限的任务中的应用。我们还演示了如何快速，轻松地近似于非常大的模型的认证鲁棒性半径，到目前为止，这在计算上一直不可行以直接计算。

Neural networks, being susceptible to adversarial attacks, should face a strict level of scrutiny before being deployed in critical or adversarial applications. This paper uses ideas from Chaos Theory to explain, analyze, and quantify the degree to which neural networks are susceptible to or robust against adversarial attacks. To this end, we present a new metric, the "susceptibility ratio," given by $\hat Ψ(h, θ)$, which captures how greatly a model's output will be changed by perturbations to a given input. Our results show that susceptibility to attack grows significantly with the depth of the model, which has safety implications for the design of neural networks for production environments. We provide experimental evidence of the relationship between $\hat Ψ$ and the post-attack accuracy of classification models, as well as a discussion of its application to tasks lacking hard decision boundaries. We also demonstrate how to quickly and easily approximate the certified robustness radii for extremely large models, which until now has been computationally infeasible to calculate directly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题