用句子在中文文本中生成对抗性示例

论文标题

用句子在中文文本中生成对抗性示例

Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces

论文作者

Li, Linyang, Shao, Yunfan, Song, Demin, Qiu, Xipeng, Huang, Xuanjing

论文摘要

文本中的对抗性攻击主要是基于替代的方法，可以替换原始文本中的单词或字符以实现成功攻击。最近的方法使用预训练的语言模型作为替代者。在中文中，这种方法不适用，因为中文中的单词首先需要分割。在本文中，我们建议使用句子制作句子来制作中文的对抗性示例，作为替代训练训练模型作为替代者。生成的对抗性示例中的替换不是字符或单词，而是\ textit {'picts'}，这对中国读者来说更自然。实验结果表明，生成的对抗样品会误导强大的目标模型，并保持流利和语义保存。

Adversarial attacks in texts are mostly substitution-based methods that replace words or characters in the original texts to achieve success attacks. Recent methods use pre-trained language models as the substitutes generator. While in Chinese, such methods are not applicable since words in Chinese require segmentations first. In this paper, we propose a pre-train language model as the substitutes generator using sentence-pieces to craft adversarial examples in Chinese. The substitutions in the generated adversarial examples are not characters or words but \textit{'pieces'}, which are more natural to Chinese readers. Experiments results show that the generated adversarial samples can mislead strong target models and remain fluent and semantically preserved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题