论文标题
用句子在中文文本中生成对抗性示例
Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces
论文作者
论文摘要
文本中的对抗性攻击主要是基于替代的方法,可以替换原始文本中的单词或字符以实现成功攻击。最近的方法使用预训练的语言模型作为替代者。在中文中,这种方法不适用,因为中文中的单词首先需要分割。在本文中,我们建议使用句子制作句子来制作中文的对抗性示例,作为替代训练训练模型作为替代者。生成的对抗性示例中的替换不是字符或单词,而是\ textit {'picts'},这对中国读者来说更自然。实验结果表明,生成的对抗样品会误导强大的目标模型,并保持流利和语义保存。
Adversarial attacks in texts are mostly substitution-based methods that replace words or characters in the original texts to achieve success attacks. Recent methods use pre-trained language models as the substitutes generator. While in Chinese, such methods are not applicable since words in Chinese require segmentations first. In this paper, we propose a pre-train language model as the substitutes generator using sentence-pieces to craft adversarial examples in Chinese. The substitutions in the generated adversarial examples are not characters or words but \textit{'pieces'}, which are more natural to Chinese readers. Experiments results show that the generated adversarial samples can mislead strong target models and remain fluent and semantically preserved.