论文标题
Heaan Demysosified:通过以架构为中心的分析和优化加速完全同态加密
HEAAN Demystified: Accelerating Fully Homomorphic Encryption Through Architecture-centric Analysis and Optimization
论文作者
论文摘要
同态加密(HE)将大量关注作为云计算的一种隐私性方法,因为它允许在称为密文的加密消息上计算。在他提出的许多计划中,他的算术数(Heaan)在广泛的应用中迅速越来越受欢迎,因为它支持可以容忍近似计算而无需限制适用于相应密封文本的算术操作数量的消息。 HE的一个严重缺点是密文算术的高计算复杂性。尤其是,他的乘法(HE MUL)比未加密消息之间的相应乘法慢10,000倍。这导致了大量的HE加速度研究,包括利用FPGA的研究;但是,这些并未对MUL的计算复杂性和数据访问模式进行严格的分析。此外,这些建议主要集中在具有较小参数尺寸的设计上,因此很难准确估计其在进行一系列复杂算术操作时的性能。在本文中,我们首先描述了如何以对计算机建筑师友好的方式进行Heaan的穆尔。然后,我们对其计算和内存访问特性进行了纪律严明的分析,通过该分析,我们(1)通过该分析在构成他的关键功能中提取并行性,(2)演示如何有效地将并行性映射到流行的并行处理平台,多核CPU和GPU,以及应用一系列优化技术,例如应用诸如传输矩阵和固定数据,并将其用于固定数据。这导致在CPU和GPU上的MUL的性能提高了42.9倍和134.1倍,在CPU上运行的单线程参考Heaan。进行的分析和优化将为未来的他加速研究树立新的基础。
Homomorphic Encryption (HE) draws a significant attention as a privacy-preserving way for cloud computing because it allows computation on encrypted messages called ciphertexts. Among numerous HE schemes proposed, HE for Arithmetic of Approximate Numbers (HEAAN) is rapidly gaining popularity across a wide range of applications because it supports messages that can tolerate approximate computation with no limit on the number of arithmetic operations applicable to the corresponding ciphertexts. A critical shortcoming of HE is the high computation complexity of ciphertext arithmetic; especially, HE multiplication (HE Mul) is more than 10,000 times slower than the corresponding multiplication between unencrypted messages. This leads to a large body of HE acceleration studies, including ones exploiting FPGAs; however, those did not conduct a rigorous analysis of computational complexity and data access patterns of HE Mul. Moreover, the proposals mostly focused on designs with small parameter sizes, making it difficult to accurately estimate their performance in conducting a series of complex arithmetic operations. In this paper, we first describe how HE Mul of HEAAN is performed in a manner friendly to computer architects. Then we conduct a disciplined analysis on its computational and memory access characteristics, through which we (1) extract parallelism in the key functions composing HE Mul and (2) demonstrate how to effectively map the parallelism to the popular parallel processing platforms, multicore CPUs and GPUs, by applying a series of optimization techniques such as transposing matrices and pinning data to threads. This leads to the performance improvement of HE Mul on a CPU and a GPU by 42.9x and 134.1x, respectively, over the single-thread reference HEAAN running on a CPU. The conducted analysis and optimization would set a new foundation for future HE acceleration research.