通过信息理论的观点进行删除文本表示学习，以实现对抗性鲁棒性

论文标题

通过信息理论的观点进行删除文本表示学习，以实现对抗性鲁棒性

Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness

论文作者

Zhao, Jiahao, Mao, Wenji

论文摘要

对抗性脆弱性仍然是构建可靠NLP系统的主要障碍。当将不可感知的扰动添加到原始输入文本中时，深度学习模型的性能可能会在攻击下急剧下降。最近的工作认为，该模型的对抗性脆弱性是由监督培训中的不舒适特征引起的。因此，在本文中，我们从分离的表示形式学习的角度应对对抗性的鲁棒性挑战，该挑战能够明确地删除文本中的鲁棒和非稳固特征。具体而言，受到信息理论中信息（VI）的变化的启发，我们得出了一个由共同信息组成的分离学习目标，以表示潜在嵌入的语义代表性以及稳健和非固定特征的分化。在此基础上，我们设计了一个分离的学习网络来估计这些相互信息。关于文本分类和累积任务的实验表明，我们的方法在对抗攻击下的代表性方法显着超过了代表性方法，这表明丢弃非舒适特征对于改善对抗性鲁棒性至关重要。

Adversarial vulnerability remains a major obstacle to constructing reliable NLP systems. When imperceptible perturbations are added to raw input text, the performance of a deep learning model may drop dramatically under attacks. Recent work argues the adversarial vulnerability of the model is caused by the non-robust features in supervised training. Thus in this paper, we tackle the adversarial robustness challenge from the view of disentangled representation learning, which is able to explicitly disentangle robust and non-robust features in text. Specifically, inspired by the variation of information (VI) in information theory, we derive a disentangled learning objective composed of mutual information to represent both the semantic representativeness of latent embeddings and differentiation of robust and non-robust features. On the basis of this, we design a disentangled learning network to estimate these mutual information. Experiments on text classification and entailment tasks show that our method significantly outperforms the representative methods under adversarial attacks, indicating that discarding non-robust features is critical for improving adversarial robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题