论文标题

Cupbop:CUDA用于并行和宽范围的处理器

CuPBoP: CUDA for Parallelized and Broad-range Processors

论文作者

Han, Ruobing, Chen, Jun, Garg, Bhanu, Young, Jeffrey, Sim, Jaewoong, Kim, Hyesoon

论文摘要

CUDA是GPU编程最受欢迎的选择之一,但只能在NVIDIA GPU上执行。在非NVIDIA设备上执行CUDA不仅使硬件社区受益,而且还允许在异质系统中数据并行计算。为了使CUDA程序可移植,一些研究人员建议使用源代码转换器将CUDA转换为可以在非NVIDIA设备上执行的便携式编程语言。但是,大多数CUDA翻译人员都需要对翻译代码进行其他手动修改,这对开发人员施加了大量的工作量。在本文中,提议Cupbop在不依赖任何便携式编程语言的情况下在非NVIDIA设备上执行CUDA。与在非NVIDIA设备上执行CUDA的现有工作相比,CupBop不需要手动修改CUDA源代码,但它仍然达到了最高的覆盖范围(69.6%),远高于Rodinia Benchmark上现有框架(56.6%)。特别是,对于CPU后端,Cupbop支持几个ISA(例如X86,RISC-V,AARCH64),并且与其他项目相比,性能接近甚至更高。我们还可以在最新的Ampere Architecture GPU上比较和分析CupBop,手动优化OpenMP/MPI程序和CUDA程序之间的性能,并显示未来的指示,用于支持具有高性能的非NVIDIA设备上的CUDA程序

CUDA is one of the most popular choices for GPU programming, but it can only be executed on NVIDIA GPUs. Executing CUDA on non-NVIDIA devices not only benefits the hardware community, but also allows data-parallel computation in heterogeneous systems. To make CUDA programs portable, some researchers have proposed using source-to-source translators to translate CUDA to portable programming languages that can be executed on non-NVIDIA devices. However, most CUDA translators require additional manual modifications on the translated code, which imposes a heavy workload on developers. In this paper, CuPBoP is proposed to execute CUDA on non-NVIDIA devices without relying on any portable programming languages. Compared with existing work that executes CUDA on non-NVIDIA devices, CuPBoP does not require manual modification of the CUDA source code, but it still achieves the highest coverage (69.6%), much higher than existing frameworks (56.6%) on the Rodinia benchmark. In particular, for CPU backends, CuPBoP supports several ISAs (e.g., X86, RISC-V, AArch64) and has close or even higher performance compared with other projects. We also compare and analyze the performance among CuPBoP, manually optimized OpenMP/MPI programs, and CUDA programs on the latest Ampere architecture GPU, and show future directions for supporting CUDA programs on non-NVIDIA devices with high performance

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源