论文标题

基于ARM Thunderx2 CPU的集群上HPC工作负载的性能和能耗

Performance and energy consumption of HPC workloads on a cluster based on Arm ThunderX2 CPU

论文作者

Mantovani, Filippo, Garcia-Gasulla, Marta, Gracia, José, Stafford, Esteban, Banchelli, Fabio, Josep-Fabrego, Marc, Criado-Ledesma, Joel, Nachtmann, Mathias

论文摘要

在本文中,我们分析了欧洲项目Mont-Blanc 3中开发的基于ARM的高性能计算(HPC)系统的性能和能量消耗。该系统称为Dibona,由Atos/Bull集成,并由最新的Marvell的CPU互助。该CPU是为Astra超级计算机提供动力的CPU,这是第一个基于ARM的超级计算机,该超级计算机于2018年11月进入前500名。我们从微基准测试到大型生产代码。我们包括对三种科学应用(有限元素流体动力学代码,平滑的粒子流体动力代码和晶格玻尔兹曼代码)和图500基准测试的跨学科评估,重点是平行和能量效率,以及研究其可伸缩性高达数千的ARMV8核心。为了进行比较,我们对Dibona和Tier-0超级计算机MarenoStrum4中的最新X86节点进行了相同的测试。我们的实验表明,Thunderx2平均表现降低了25%,这主要是由于其小型向量单元,但由于CPU与主要内存之间的30%宽度链接而有所弥补。我们发现,ARMV8体系结构的软件生态系统与英特尔可用的软件相当。我们的结果还表明,Thunderx2提供了类似或更好的能量到解决方案和可扩展性,证明基于ARM的芯片是下一代HPC系统市场中合法的竞争者。

In this paper, we analyze the performance and energy consumption of an Arm-based high-performance computing (HPC) system developed within the European project Mont-Blanc 3. This system, called Dibona, has been integrated by ATOS/Bull, and it is powered by the latest Marvell's CPU, ThunderX2. This CPU is the same one that powers the Astra supercomputer, the first Arm-based supercomputer entering the Top500 in November 2018. We study from micro-benchmarks up to large production codes. We include an interdisciplinary evaluation of three scientific applications (a finite-element fluid dynamics code, a smoothed particle hydrodynamics code, and a lattice Boltzmann code) and the Graph 500 benchmark, focusing on parallel and energy efficiency as well as studying their scalability up to thousands of Armv8 cores. For comparison, we run the same tests on state-of-the-art x86 nodes included in Dibona and the Tier-0 supercomputer MareNostrum4. Our experiments show that the ThunderX2 has a 25% lower performance on average, mainly due to its small vector unit yet somewhat compensated by its 30% wider links between the CPU and the main memory. We found that the software ecosystem of the Armv8 architecture is comparable to the one available for Intel. Our results also show that ThunderX2 delivers similar or better energy-to-solution and scalability, proving that Arm-based chips are legitimate contenders in the market of next-generation HPC systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源