论文标题

多核簇的MPI集体:混合MPI+MPI并行代码的优化性能

MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

论文作者

Zhou, Huan, Gracia, Jose, Schneider, Ralf

论文摘要

群集中多核处理器的出现倡导者倡导混合平行编程,该编程结合了消息传递接口(MPI)的节点并行性与节点并行性的共享存储器模型。与MPI Plus OpenMP的传统混合方法相比,MPI加MPI 3共享 - 内存扩展(MPI+MPI)的一种新的但有希望的混合方法正在吸引人。我们在混合MPI+MPI的背景下描述了用于集体操作(Allgather和广播为具体示例)的算法方法,以最大程度地减少内存消耗和内存副本。使用这种方法,只有一份记忆副本可以通过自节点过程维护和共享。这允许删除在纯MPI的背景下调用集体过程时,MPI过程之间需要的不必要的电源副本。我们比较了混合MPI+MPI的集体方法和纯MPI的传统方法,并就保证数据完整性所需的同步进行了讨论。我们的方法的性能已在Cray XC40系统(CRAY MPI)和NEC群集(OpenMPI)上进行了验证,这表明它在Allgather操作方面取得了可比或更好的性能。我们已经通过标准的计算内核,即分布式矩阵乘法和贝叶斯概率矩阵分数代码进一步验证了我们的方法。

The advent of multi-/many-core processors in clusters advocates hybrid parallel programming, which combines Message Passing Interface (MPI) for inter-node parallelism with a shared memory model for on-node parallelism. Compared to the traditional hybrid approach of MPI plus OpenMP, a new, but promising hybrid approach of MPI plus MPI-3 shared-memory extensions (MPI+MPI) is gaining attraction. We describe an algorithmic approach for collective operations (with allgather and broadcast as concrete examples) in the context of hybrid MPI+MPI, so as to minimize memory consumption and memory copies. With this approach, only one memory copy is maintained and shared by on-node processes. This allows the removal of unnecessary on-node copies of replicated data that are required between MPI processes when the collectives are invoked in the context of pure MPI. We compare our approach of collectives for hybrid MPI+MPI and the traditional one for pure MPI, and also have a discussion on the synchronization that is required to guarantee data integrity. The performance of our approach has been validated on a Cray XC40 system (Cray MPI) and NEC cluster (OpenMPI), showing that it achieves comparable or better performance for allgather operations. We have further validated our approach with a standard computational kernel, namely distributed matrix multiplication, and a Bayesian Probabilistic Matrix Factorization code.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源