通过后门关键字标识来缓解基于LSTM的文本分类系统中的后门攻击

论文标题

通过后门关键字标识来缓解基于LSTM的文本分类系统中的后门攻击

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

论文作者

Chen, Chuanshuai, Dai, Jiazhu

论文摘要

已经证明，深度神经网络正面临着一种称为后门攻击的新威胁，对手可以通过中毒培训数据集将后门注入神经网络模型。当输入包含一些特殊模式称为后门触发器时，带有后门的模型将执行恶意任务，例如对手指定的错误分类。在文本分类系统中，模型中插入的后门会导致垃圾邮件或恶意语音逃脱检测。先前的工作主要集中在计算机视觉中的后门攻击方面，几乎没有关注RNN后门攻击文本分类的防御方法。在本文中，通过分析内部LSTM神经元的变化，我们提出了一种称为后门关键字识别（BKI）的辩护方法，以减轻对手通过数据中毒对基于LSTM的文本分类执行的后门攻击。此方法可以识别并排除在没有经过验证和值得信赖的数据集的情况下，从训练数据中插入模型中的中毒样品。我们在四个不同的文本分类数据集上评估了我们的方法：IMDB，DBPEDIA本体论，20个新闻组和路透社-21578数据集。无论触发句子如何，都可以实现良好的性能。

It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some special pattern called the backdoor trigger, the model with backdoor will carry out malicious task such as misclassification specified by adversaries. In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection. Previous work mainly focused on the defense of backdoor attacks in computer vision, little attention has been paid to defense method for RNN backdoor attacks regarding text classification. In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks which the adversary performs against LSTM-based text classification by data poisoning. This method can identify and exclude poisoning samples crafted to insert backdoor into the model from training data without a verified and trusted dataset. We evaluate our method on four different text classification datset: IMDB, DBpedia ontology, 20 newsgroups and Reuters-21578 dataset. It all achieves good performance regardless of the trigger sentences.

下载PDF全文

下载文献需遵守相关版权规定

论文标题