BAE：基于BERT的文本分类示例

论文标题

BAE：基于BERT的文本分类示例

BAE: BERT-based Adversarial Examples for Text Classification

论文作者

Garg, Siddhant, Ramakrishnan, Goutham

论文摘要

现代文本分类模型容易受到对抗性示例的影响，这是人类无法分类的原始文本的扰动版本。 NLP中的最新作品使用基于规则的同义替代策略来生成对抗性示例。这些策略可以导致外在和不自然复杂的令牌替代品，这很容易被人类识别。我们提出了BAE，这是一种黑匣子攻击，用于使用BERT蒙版语言模型的上下文扰动生成对抗示例。 BAE通过掩盖了文本的一部分并利用Bert-MLM来生成蒙版令牌的替代方案，从而在原始文本中替换和插入令牌。通过自动和人类的评估，我们表明，与先前的工作相比，除了产生具有提高语法和语义连贯性的对抗性示例外，BAE还具有更强的攻击。

Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans which get misclassified by the model. Recent works in NLP use rule-based synonym replacement strategies to generate adversarial examples. These strategies can lead to out-of-context and unnaturally complex token replacements, which are easily identifiable by humans. We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging the BERT-MLM to generate alternatives for the masked tokens. Through automatic and human evaluations, we show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题