论文标题
DNN是否被极其无法识别的图像所愚弄吗?
Are DNNs fooled by extremely unrecognizable images?
论文作者
论文摘要
愚弄图像是对深神经网络(DNN)的潜在威胁。这些图像不能被人类识别为自然物体,例如狗和猫,而是被DNN错误分类为具有较高置信度得分的天然对象类。尽管有原始的设计概念,但现有的欺骗图像仍保留了一些目标对象的特征。因此,DNN可以对这些特征做出反应。在本文中,我们解决了一个问题,即在本地或全球范围内没有自然物体的特征模式的图像。作为最小情况,我们介绍了一些更改的像素,称为稀疏欺骗图像(SFIS)。我们首先证明SFI始终存在于线性和非线性模型的轻度条件下,并揭示复杂模型更可能容易受到SFI攻击的影响。通过两种SFI生成方法,我们证明,在更深层次的层中,SFI最终具有与自然图像的特征相似的特征,因此成功地,成功的DNNS。在其他层中,我们发现最大池层导致针对SFI的漏洞。还讨论了针对SFI和可转让性的辩护。这项研究通过引入一类新型图像,这些图像远离自然图像,从而突出了DNN的新脆弱性。
Fooling images are a potential threat to deep neural networks (DNNs). These images are not recognizable to humans as natural objects, such as dogs and cats, but are misclassified by DNNs as natural-object classes with high confidence scores. Despite their original design concept, existing fooling images retain some features that are characteristic of the target objects if looked into closely. Hence, DNNs can react to these features. In this paper, we address the question of whether there can be fooling images with no characteristic pattern of natural objects locally or globally. As a minimal case, we introduce single-color images with a few pixels altered, called sparse fooling images (SFIs). We first prove that SFIs always exist under mild conditions for linear and nonlinear models and reveal that complex models are more likely to be vulnerable to SFI attacks. With two SFI generation methods, we demonstrate that in deeper layers, SFIs end up with similar features to those of natural images, and consequently, fool DNNs successfully. Among other layers, we discovered that the max pooling layer causes the vulnerability against SFIs. The defense against SFIs and transferability are also discussed. This study highlights the new vulnerability of DNNs by introducing a novel class of images that distributes extremely far from natural images.