M立方用于GPU的多维集成的有效且可移植的实施

论文标题

M立方用于GPU的多维集成的有效且可移植的实施

m-CUBES An efficient and portable implementation of multi-dimensional integration for gpus

论文作者

Sakiotis, Ioannis, Arumugam, Kamesh, Paterno, Marc, Ranjan, Desh, Terzic, Balsa, Zubair, Mohammad

论文摘要

在物理和其他科学领域中，经常遇到多维数值集成的任务，例如，在物理系统和贝叶斯参数估计中对系统不确定性的影响进行建模。多维整合通常对CPU进行时间过滤。在整个集成空间之间的工作负载无法先验，因此对多核体系结构上的有效实施具有挑战性。我们提出了M-Cibes，这是众所周知的Vegas算法的新型实施，以执行GPU。维加斯转换整合变量，然后使用所得空间的自适应分配来计算蒙特卡洛积分估计。 M立方通过在处理器中保持相对统一的工作量来提高GPU的性能。结果，我们对NVIDIA GPU的优化CUDA实施优于过去文献中提出的并行化方法。我们通过评估宇宙学应用程序的六维积分来进一步证明M-cubes的效率，比古巴图书馆的CPU对拉斯维加斯的CPU实施，实现了显着的加速和更高的精度。我们还评估了标准集成测试套件的M立方。 M-Cibes的表现优于古巴和GSL库的串行实现，同时保持可比较的精度。与公共可用的基于蒙特卡洛的GPU实施相比，我们的方法至少产生了至少10个。总而言之，使用标准库和自定义实现的M-Cubes可以解决昂贵的积分。现代的C ++接口仅限标头实现使M-Cubes便携式可移植，从而使其在复杂的管道中使用，具有易于定义状态积分的复杂管道。使用Kokkos框架对M-Cubes的初始实施实现了与非NVIDIA GPU的兼容性。

The task of multi-dimensional numerical integration is frequently encountered in physics and other scientific fields, e.g., in modeling the effects of systematic uncertainties in physical systems and in Bayesian parameter estimation. Multi-dimensional integration is often time-prohibitive on CPUs. Efficient implementation on many-core architectures is challenging as the workload across the integration space cannot be predicted a priori. We propose m-Cubes, a novel implementation of the well-known Vegas algorithm for execution on GPUs. Vegas transforms integration variables followed by calculation of a Monte Carlo integral estimate using adaptive partitioning of the resulting space. m-Cubes improves performance on GPUs by maintaining relatively uniform workload across the processors. As a result, our optimized Cuda implementation for Nvidia GPUs outperforms parallelization approaches proposed in past literature. We further demonstrate the efficiency of m-Cubes by evaluating a six-dimensional integral from a cosmology application, achieving significant speedup and greater precision than the CUBA library's CPU implementation of VEGAS. We also evaluate m-Cubes on a standard integrand test suite. m-Cubes outperforms the serial implementations of the Cuba and GSL libraries by orders of magnitude speedup while maintaining comparable accuracy. Our approach yields a speedup of at least 10 when compared against publicly available Monte Carlo based GPU implementations. In summary, m-Cubes can solve integrals that are prohibitively expensive using standard libraries and custom implementations. A modern C++ interface header-only implementation makes m-Cubes portable, allowing its utilization in complicated pipelines with easy to define stateful integrals. Compatibility with non-Nvidia GPUs is achieved with our initial implementation of m-Cubes using the Kokkos framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题