使用cuda和cuda-ware opermpi的浅水方程的时间说明有限体积求解器的多GPU实现

论文标题

使用cuda和cuda-ware opermpi的浅水方程的时间说明有限体积求解器的多GPU实现

Multi-GPU implementation of a time-explicit finite volume solver for the Shallow-Water Equations using CUDA and a CUDA-Aware version of OpenMPI

论文作者

Delmas, Vincent, Soulaïmani, Azzedine

论文摘要

本文显示了多GPU的多GPU版本的浅水方程式（SWE）的多GPU版本的多GPU版本。 MPI与Cuda-Fortran结合使用，以便根据需要使用尽可能多的GPU。 METIS库被利用以对2D非结构化三角形网格进行域分解。采用了CUDA-AWARE OPENMPI版本，以加快MPI进程之间的消息。对加速和效率进行了研究；首先，要在运河中进行经典的大坝破裂流，然后对于两个带有复杂测深的真实领域：米勒·伊尔斯河和蒙特利尔群岛。在这两种情况下，都使用了高达1300万个细胞的网格。在这些网格上使用24至28 GPU导致80％及以上的效率。最后，将多GPU版本与纯MPI Multi-CPU版本进行了比较，可以得出结论，在这种特殊情况下，将需要大约100个CPU内核才能达到与一个GPU相同的性能。

This paper shows the development of a multi-GPU version of a time-explicit finite volume solver for the Shallow-Water Equations (SWE) on a multi-GPU architecture. MPI is combined with CUDA-Fortran in order to use as many GPUs as needed. The METIS library is leveraged to perform a domain decomposition on the 2D unstructured triangular meshes of interest. A CUDA-Aware OpenMPI version is adopted to speed up the messages between the MPI processes. A study of both speed-up and efficiency is conducted; first, for a classic dam-break flow in a canal, and then for two real domains with complex bathymetries: the Mille Îles river and the Montreal archipelago. In both cases, meshes with up to 13 million cells are used. Using 24 to 28 GPUs on these meshes leads to an efficiency of 80% and more. Finally, the multi-GPU version is compared to the pure MPI multi-CPU version, and it is concluded that in this particular case, about 100 CPU cores would be needed to achieve the same performance as one GPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题