GPU内核的自动水平融合

论文标题

GPU内核的自动水平融合

Automatic Horizontal Fusion for GPU Kernels

论文作者

Li, Ao, Zheng, Bojian, Pekhimenko, Gennady, Long, Fan

论文摘要

我们提出了自动水平融合，这是一种新颖的优化技术，可补充GPU程序的标准内核融合技术。与标准融合的目标是消除中间数据圆旅行，我们的水平融合技术旨在增加线程级并行性以隐藏指令潜伏期。我们还提出了HFUSE，这是实现自动水平融合的源CUDA编译器的新来源。我们的实验结果表明，水平融合可以加快运行时间2.5％-60.8％。我们的结果表明，水平融合对于将内核与需要不同类型的GPU资源的说明（例如，内存密集型内核和计算密集型内核）融合特别有益。

We present automatic horizontal fusion, a novel optimization technique that complements the standard kernel fusion techniques for GPU programs. Unlike the standard fusion, whose goal is to eliminate intermediate data round trips, our horizontal fusion technique aims to increase the thread-level parallelism to hide instruction latencies. We also present HFuse, a new source to source CUDA compiler that implements automatic horizontal fusion. Our experimental results show that horizontal fusion can speed up the running time by 2.5%-60.8%. Our results reveal that the horizontal fusion is especially beneficial for fusing kernels with instructions that require different kinds of GPU resources (e.g., a memory-intensive kernel and a compute-intensive kernel).

下载PDF全文

下载文献需遵守相关版权规定

论文标题