论文标题
高阶PDE的高效GPU处理技术,技巧和算法
Techniques, Tricks and Algorithms for Efficient GPU-Based Processing of Higher Order Hyperbolic PDEs
论文作者
论文摘要
GPU计算有望在所有现代Exascale超级计算机中发挥不可或缺的作用。还可以预期,高级戈诺夫计划将占此类超级计算机上应用程序混合的很大一部分。因此,为这个新兴的机会准备高级计划的用户社区为高级计划的使用者提供准备。我们专注于三个宽阔和高度影响的领域,在这些领域中,使用了高级戈多诺夫计划。第一个区域是计算流体动力学(CFD)。第二个是计算磁流失动力学(MHD),其具有模仿保留的不合适约束。第三个是计算电动动力学(CED),具有相关限制,也具有极高的源术语。加在一起,这三种高级Godunov方法论的用途涵盖了许多最重要的应用领域。在所有三种情况下,我们都表明,算法,技术和技巧以及使用OpenACC的最佳使用可在GPU上产生最高的加速!作为奖励,我们发现了一个最引人注目,最可取的结果:某些高阶计划,其每个区域的操作较大,比GPU上的下级方案显示出更好的速度。换句话说,GPU是克服高阶方案的较高计算复杂性的最佳策略!还已经确定了几种未来改进的途径。使用GPU和可比数量的高端多核CPU提出了一项可伸缩性研究。发现GPU与可比数量的CPU相比具有很大的性能优势,尤其是当使用本文设计的所有方法时。
GPU computing is expected to play an integral part in all modern Exascale supercomputers. It is also expected that higher order Godunov schemes will make up about a significant fraction of the application mix on such supercomputers. It is, therefore, very important to prepare the community of users of higher order schemes for hyperbolic PDEs for this emerging opportunity. We focus on three broad and high-impact areas where higher order Godunov schemes are used. The first area is computational fluid dynamics (CFD). The second is computational magnetohydrodynamics (MHD) which has an involution constraint that has to be mimetically preserved. The third is computational electrodynamics (CED) which has involution constraints and also extremely stiff source terms. Together, these three diverse uses of higher order Godunov methodology, cover many of the most important applications areas. In all three cases, we show that the optimal use of algorithms, techniques and tricks, along with the use of OpenACC, yields superlative speedups on GPUs! As a bonus, we find a most remarkable and desirable result: some higher order schemes, with their larger operations count per zone, show better speedup than lower order schemes on GPUs. In other words, the GPU is an optimal stratagem for overcoming the higher computational complexities of higher order schemes! Several avenues for future improvement have also been identified. A scalability study is presented for a real-world application using GPUs and comparable numbers of high-end multicore CPUs. It is found that GPUs offer a substantial performance benefit over comparable number of CPUs, especially when all the methods designed in this paper are used.