论文标题
防御隐秘的后门攻击
Defending Against Stealthy Backdoor Attacks
论文作者
论文摘要
防御安全威胁一直是最近研究的兴趣。最近的作品表明,攻击自然语言处理(NLP)模型并不难,而对它们进行防御仍然是猫鼠游戏。后门攻击就是一种这样的攻击,其中神经网络以某种方式对包含某些触发器的特定样本进行执行,同时在其他样本上实现正常结果。在这项工作中,我们提出了一些防御策略,这些策略可用于抵制这种攻击。我们表明,我们的防御方法会大大降低攻击输入的性能,同时保持良性输入的相似性能。我们还表明,我们的某些防御能力的运行时间较小,并且与原始输入保持相似性。
Defenses against security threats have been an interest of recent studies. Recent works have shown that it is not difficult to attack a natural language processing (NLP) model while defending against them is still a cat-mouse game. Backdoor attacks are one such attack where a neural network is made to perform in a certain way on specific samples containing some triggers while achieving normal results on other samples. In this work, we present a few defense strategies that can be useful to counter against such an attack. We show that our defense methodologies significantly decrease the performance on the attacked inputs while maintaining similar performance on benign inputs. We also show that some of our defenses have very less runtime and also maintain similarity with the original inputs.