论文标题

任务图:一个低的争论OpenMP任务框架

Taskgraph: A Low Contention OpenMP Tasking Framework

论文作者

Yu, Chenle, Royuela, Sara, Quiñones, Eduardo

论文摘要

OpenMP是高性能计算(HPC)中共享内存系统的事实上的标准。它包括一个基于任务的模型,该模型提供了高级抽象,以简单且灵活的方式有效利用高度动态的结构化和非结构化的并行性。不幸的是,在大多数常见的OpenMP框架(例如GCC,LLVM)中,引入管理任务的运行时间开销(非常),它破坏了任务模型的潜在好处,并且仅适用于粗粒任务。本文介绍了TaskGraph,该框架使用任务依赖图(TDG)代表与OpenMP任务实现的代码区域,以减少与任务管理相关的运行时开销,即竞争和并行编排,包括任务创建和同步。 TDG避免了与任务依赖性解决方案相关的间接费用,并大大减少了从访问到共享资源的范围。此外,任务图框架在OpenMP中介绍了记录和重新播放模型,该模型从其第二个执行中加速了任务绘图区域。总体而言,本文提供的多个优化允许利用细粒度的OpenMP任务来应对当前应用的趋势,以利用大规模的节点并行性,细粒度和动态的调度范式。该框架在LLVM 15.0上实现。结果表明,对于所有结构化和非结构化的并行性,任务记录实现的性能优于香草OpenMP系统,并考虑了粗糙和细粒度的任务。此外,所提出的框架大大减少了任务与OpenMP线程模型之间的性能差距。

OpenMP is the de-facto standard for shared memory systems in High-Performance Computing (HPC). It includes a task-based model that offers a high-level of abstraction to effectively exploit highly dynamic structured and unstructured parallelism in an easy and flexible way. Unfortunately, the run-time overheads introduced to manage tasks are (very) high in most common OpenMP frameworks (e.g., GCC, LLVM), which defeats the potential benefits of the tasking model, and makes it suitable for coarse-grained tasks only. This paper presents taskgraph, a framework that uses a task dependency graph (TDG) to represent a region of code implemented with OpenMP tasks in order to reduce the run-time overheads associated with the management of tasks, i.e., contention and parallel orchestration, including task creation and synchronization. The TDG avoids the overheads related to the resolution of task dependencies and greatly reduces those deriving from the accesses to shared resources. Moreover, the taskgraph framework introduces in OpenMP the record-and-replay execution model that accelerates the taskgraph region from its second execution. Overall, the multiple optimizations presented in this paper allow exploiting fine-grained OpenMP tasks to cope with the trend in current applications pointing to leverage massive on-node parallelism, fine-grained and dynamic scheduling paradigms. The framework is implemented on LLVM 15.0. Results show that the taskgraph implementation outperforms the vanilla OpenMP system in terms of performance and scalability, for all structured and unstructured parallelism, and considering coarse and fine grained tasks. Furthermore, the proposed framework considerably reduces the performance gap between the task and the thread models of OpenMP.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源