Scalana：通过图形分析自动化缩放损失检测

论文标题

Scalana：通过图形分析自动化缩放损失检测

ScalAna: Automating Scaling Loss Detection with Graph Analysis

论文作者

Jin, Yuyang, Wang, Haojie, Yu, Teng, Tang, Xiongchao, Hoefler, Torsten, Liu, Xu, Zhai, Jidong

论文摘要

由于过程间的交流，AMDAHL的定律和资源争论，将并行程序扩展到现代超级计算机是具有挑战性的。绩效分析工具用于查找这种缩放瓶颈的工具基于分析或跟踪。分析会产生低开销，但不会捕获根本原因分析所需的详细依赖性。跟踪收集所有信息的间接费用。在这项工作中，我们设计了使用静态分析技术来实现两全其美的Scalana，它可以以类似于分析的成本来分析痕迹。 Scalana首先利用静态编译器技术来构建程序结构图，该图记录了主要计算和通信模式以及程序的控制结构。在运行时，我们采用轻量级技术根据图形结构收集性能数据并生成程序性能图。使用此图，我们提出了一种新的方法，称为“回溯性根本原因检测”，该方法可以自动有效地检测到缩放损失的根本原因。我们通过实际应用评估Scalana。结果表明，我们的方法可以有效地定位实际应用的缩放量损失的根本原因，并在多达2,048个过程中平均造成1.73％的开销。通过固定Scalana在2,048个过程中检测到的根本原因，我们最多可提高性能。

Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads. In this work, we design ScalAna that uses static analysis techniques to achieve the best of both worlds - it enables the analyzability of traces at a cost similar to profiling. ScalAna first leverages static compiler techniques to build a Program Structure Graph, which records the main computation and communication patterns as well as the program's control structures. At runtime, we adopt lightweight techniques to collect performance data according to the graph structure and generate a Program Performance Graph. With this graph, we propose a novel approach, called backtracking root cause detection, which can automatically and efficiently detect the root cause of scaling loss. We evaluate ScalAna with real applications. Results show that our approach can effectively locate the root cause of scaling loss for real applications and incurs 1.73% overhead on average for up to 2,048 processes. We achieve up to 11.11% performance improvement by fixing the root causes detected by ScalAna on 2,048 processes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题