Cupbop：CUDA用于并行和宽范围的处理器

论文标题

Cupbop：CUDA用于并行和宽范围的处理器

CuPBoP: CUDA for Parallelized and Broad-range Processors

论文作者

Han, Ruobing, Chen, Jun, Garg, Bhanu, Young, Jeffrey, Sim, Jaewoong, Kim, Hyesoon

论文摘要

CUDA是GPU编程最受欢迎的选择之一，但只能在NVIDIA GPU上执行。在非NVIDIA设备上执行CUDA不仅使硬件社区受益，而且还允许在异质系统中数据并行计算。为了使CUDA程序可移植，一些研究人员建议使用源代码转换器将CUDA转换为可以在非NVIDIA设备上执行的便携式编程语言。但是，大多数CUDA翻译人员都需要对翻译代码进行其他手动修改，这对开发人员施加了大量的工作量。在本文中，提议Cupbop在不依赖任何便携式编程语言的情况下在非NVIDIA设备上执行CUDA。与在非NVIDIA设备上执行CUDA的现有工作相比，CupBop不需要手动修改CUDA源代码，但它仍然达到了最高的覆盖范围（69.6％），远高于Rodinia Benchmark上现有框架（56.6％）。特别是，对于CPU后端，Cupbop支持几个ISA（例如X86，RISC-V，AARCH64），并且与其他项目相比，性能接近甚至更高。我们还可以在最新的Ampere Architecture GPU上比较和分析CupBop，手动优化OpenMP/MPI程序和CUDA程序之间的性能，并显示未来的指示，用于支持具有高性能的非NVIDIA设备上的CUDA程序

CUDA is one of the most popular choices for GPU programming, but it can only be executed on NVIDIA GPUs. Executing CUDA on non-NVIDIA devices not only benefits the hardware community, but also allows data-parallel computation in heterogeneous systems. To make CUDA programs portable, some researchers have proposed using source-to-source translators to translate CUDA to portable programming languages that can be executed on non-NVIDIA devices. However, most CUDA translators require additional manual modifications on the translated code, which imposes a heavy workload on developers. In this paper, CuPBoP is proposed to execute CUDA on non-NVIDIA devices without relying on any portable programming languages. Compared with existing work that executes CUDA on non-NVIDIA devices, CuPBoP does not require manual modification of the CUDA source code, but it still achieves the highest coverage (69.6%), much higher than existing frameworks (56.6%) on the Rodinia benchmark. In particular, for CPU backends, CuPBoP supports several ISAs (e.g., X86, RISC-V, AArch64) and has close or even higher performance compared with other projects. We also compare and analyze the performance among CuPBoP, manually optimized OpenMP/MPI programs, and CUDA programs on the latest Ampere architecture GPU, and show future directions for supporting CUDA programs on non-NVIDIA devices with high performance

下载PDF全文

下载文献需遵守相关版权规定

论文标题