论文标题
预测高性能计算输入/输出变异性及其对系统配置优化的应用
Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations
论文作者
论文摘要
性能变异性是可靠的高性能计算(HPC)系统的重要措施。性能变异性受许多因素之间的复杂相互作用的影响,例如CPU频率,输入/输出(IO)线程的数量和IO调度程序。在本文中,我们专注于HPC IO变异性。 HPC变异性的预测是HPC系统工程中的一个具有挑战性的问题,迄今为止,此问题几乎没有统计工作。尽管计算机实验文献中有许多方法可用,但是现有方法在HPC性能变异性需要调查中的适用性,特别是当目标是预测插值和外推设置中的性能变异性时。开发了一个数据分析框架,以模拟从大规模实验中收集的数据。各种有希望的方法用于构建HPC系统变异性的预测模型。我们通过测量以前看不见的系统配置的预测准确性来评估方法的性能。我们还讨论了一种使用估计可变性图的系统配置的方法。方法比较中的发现和本文开发的工具集产生了对现有统计方法的新见解,并且对HPC变异性管理的实践可能是有益的。本文在线上具有补充材料。
Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/output (IO) threads, and the IO scheduler. In this paper, we focus on HPC IO variability. The prediction of HPC variability is a challenging problem in the engineering of HPC systems and there is little statistical work on this problem to date. Although there are many methods available in the computer experiment literature, the applicability of existing methods to HPC performance variability needs investigation, especially, when the objective is to predict performance variability both in interpolation and extrapolation settings. A data analytic framework is developed to model data collected from large-scale experiments. Various promising methods are used to build predictive models for the variability of HPC systems. We evaluate the performance of the methods by measuring prediction accuracy at previously unseen system configurations. We also discuss a methodology for optimizing system configurations that uses the estimated variability map. The findings from method comparisons and developed tool sets in this paper yield new insights into existing statistical methods and can be beneficial for the practice of HPC variability management. This paper has supplementary materials online.