论文标题
M立方用于GPU的多维集成的有效且可移植的实施
m-CUBES An efficient and portable implementation of multi-dimensional integration for gpus
论文作者
论文摘要
在物理和其他科学领域中,经常遇到多维数值集成的任务,例如,在物理系统和贝叶斯参数估计中对系统不确定性的影响进行建模。多维整合通常对CPU进行时间过滤。在整个集成空间之间的工作负载无法先验,因此对多核体系结构上的有效实施具有挑战性。我们提出了M-Cibes,这是众所周知的Vegas算法的新型实施,以执行GPU。维加斯转换整合变量,然后使用所得空间的自适应分配来计算蒙特卡洛积分估计。 M立方通过在处理器中保持相对统一的工作量来提高GPU的性能。结果,我们对NVIDIA GPU的优化CUDA实施优于过去文献中提出的并行化方法。我们通过评估宇宙学应用程序的六维积分来进一步证明M-cubes的效率,比古巴图书馆的CPU对拉斯维加斯的CPU实施,实现了显着的加速和更高的精度。我们还评估了标准集成测试套件的M立方。 M-Cibes的表现优于古巴和GSL库的串行实现,同时保持可比较的精度。与公共可用的基于蒙特卡洛的GPU实施相比,我们的方法至少产生了至少10个。总而言之,使用标准库和自定义实现的M-Cubes可以解决昂贵的积分。现代的C ++接口仅限标头实现使M-Cubes便携式可移植,从而使其在复杂的管道中使用,具有易于定义状态积分的复杂管道。使用Kokkos框架对M-Cubes的初始实施实现了与非NVIDIA GPU的兼容性。
The task of multi-dimensional numerical integration is frequently encountered in physics and other scientific fields, e.g., in modeling the effects of systematic uncertainties in physical systems and in Bayesian parameter estimation. Multi-dimensional integration is often time-prohibitive on CPUs. Efficient implementation on many-core architectures is challenging as the workload across the integration space cannot be predicted a priori. We propose m-Cubes, a novel implementation of the well-known Vegas algorithm for execution on GPUs. Vegas transforms integration variables followed by calculation of a Monte Carlo integral estimate using adaptive partitioning of the resulting space. m-Cubes improves performance on GPUs by maintaining relatively uniform workload across the processors. As a result, our optimized Cuda implementation for Nvidia GPUs outperforms parallelization approaches proposed in past literature. We further demonstrate the efficiency of m-Cubes by evaluating a six-dimensional integral from a cosmology application, achieving significant speedup and greater precision than the CUBA library's CPU implementation of VEGAS. We also evaluate m-Cubes on a standard integrand test suite. m-Cubes outperforms the serial implementations of the Cuba and GSL libraries by orders of magnitude speedup while maintaining comparable accuracy. Our approach yields a speedup of at least 10 when compared against publicly available Monte Carlo based GPU implementations. In summary, m-Cubes can solve integrals that are prohibitively expensive using standard libraries and custom implementations. A modern C++ interface header-only implementation makes m-Cubes portable, allowing its utilization in complicated pipelines with easy to define stateful integrals. Compatibility with non-Nvidia GPUs is achieved with our initial implementation of m-Cubes using the Kokkos framework.