evit：通过加密视觉变压器在云计算中通过加密视觉变压器检索隐私图像检索

论文标题

evit：通过加密视觉变压器在云计算中通过加密视觉变压器检索隐私图像检索

EViT: Privacy-Preserving Image Retrieval via Encrypted Vision Transformer in Cloud Computing

论文作者

Feng, Qihua, Li, Peiya, Lu, Zhixun, Li, Chaozhuo, Wang, Zefang, Liu, Zhiquan, Duan, Chunhui, Huang, Feiran

论文摘要

图像检索系统可帮助用户实时浏览和搜索。随着云计算的兴起，检索任务通常外包到云服务器。但是，由于云服务器无法完全信任，因此云场景带来了隐私保护的艰巨挑战。为此，已经开发了基于图像加密的图像检索方案，首先从密码图像中提取特征，然后根据这些功能构建检索模型。然而，大多数现有方法提取浅特征和设计微不足道的检索模型，从而导致密码图像的表现不足。在本文中，我们提出了一种名为“加密视觉变压器”（EVIT）的新型范式，该范式提高了密码图像的判别性能力。首先，为了捕获全面的统治信息，我们从密码图像中提取多级局部长度序列和全局Huffman代码频率特征，这些序列在JPEG压缩过程中由流密码加密的密码图像。其次，我们设计了基于视觉变压器的检索模型，以与多层次的特征相结合，并提出了两种自适应数据增强方法，以提高检索模型的表示能力。我们的建议很容易通过自我监督的对比学习方式来适应无监督和监督的环境。广泛的实验表明，EVIT既可以实现出色的加密和检索性能，从而优于当前方案，而在大幅度的检索准确性方面优于当前方案，同时有效地保护图像隐私。代码可在\ url {https://github.com/onlinehuazai/evit}上公开获得。

Image retrieval systems help users to browse and search among extensive images in real-time. With the rise of cloud computing, retrieval tasks are usually outsourced to cloud servers. However, the cloud scenario brings a daunting challenge of privacy protection as cloud servers cannot be fully trusted. To this end, image-encryption-based privacy-preserving image retrieval schemes have been developed, which first extract features from cipher-images, and then build retrieval models based on these features. Yet, most existing approaches extract shallow features and design trivial retrieval models, resulting in insufficient expressiveness for the cipher-images. In this paper, we propose a novel paradigm named Encrypted Vision Transformer (EViT), which advances the discriminative representations capability of cipher-images. First, in order to capture comprehensive ruled information, we extract multi-level local length sequence and global Huffman-code frequency features from the cipher-images which are encrypted by stream cipher during JPEG compression process. Second, we design the Vision Transformer-based retrieval model to couple with the multi-level features, and propose two adaptive data augmentation methods to improve representation power of the retrieval model. Our proposal can be easily adapted to unsupervised and supervised settings via self-supervised contrastive learning manner. Extensive experiments reveal that EViT achieves both excellent encryption and retrieval performance, outperforming current schemes in terms of retrieval accuracy by large margins while protecting image privacy effectively. Code is publicly available at \url{https://github.com/onlinehuazai/EViT}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题