神经网络脆弱的水印，没有模型性能降解

论文标题

神经网络脆弱的水印，没有模型性能降解

Neural network fragile watermarking with no model performance degradation

论文作者

Yin, Zhaoxia, Yin, Heng, Zhang, Xinpeng

论文摘要

深层神经网络容易受到恶意微调攻击，例如数据中毒和后门攻击。因此，在最近的研究中，提出了如何检测神经网络模型的恶意微调。但是，它通常会对受保护模型的性能产生负面影响。因此，我们提出了一个新的神经网络脆弱的水印，没有模型性能降解。在水印过程中，我们训练具有特定损耗函数和秘密键的生成模型，以生成对目标分类器微调敏感的触发器。在验证过程中，我们采用了水印的分类器来获取每个脆弱的触发器的标签。然后，可以通过比较秘密键和标签来检测恶意微调。经典数据集和分类器上的实验表明，所提出的方法可以有效地检测模型恶意调整而没有模型性能降解。

Deep neural networks are vulnerable to malicious fine-tuning attacks such as data poisoning and backdoor attacks. Therefore, in recent research, it is proposed how to detect malicious fine-tuning of neural network models. However, it usually negatively affects the performance of the protected model. Thus, we propose a novel neural network fragile watermarking with no model performance degradation. In the process of watermarking, we train a generative model with the specific loss function and secret key to generate triggers that are sensitive to the fine-tuning of the target classifier. In the process of verifying, we adopt the watermarked classifier to get labels of each fragile trigger. Then, malicious fine-tuning can be detected by comparing secret keys and labels. Experiments on classic datasets and classifiers show that the proposed method can effectively detect model malicious fine-tuning with no model performance degradation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题