途径：ML的异步分布式数据流

论文标题

途径：ML的异步分布式数据流

Pathways: Asynchronous Distributed Dataflow for ML

论文作者

Barham, Paul, Chowdhery, Aakanksha, Dean, Jeff, Ghemawat, Sanjay, Hand, Steven, Hurt, Dan, Isard, Michael, Lim, Hyeontaek, Pang, Ruoming, Roy, Sudip, Saeta, Brennan, Schuh, Parker, Sepassi, Ryan, Shafey, Laurent El, Thekkath, Chandramohan A., Wu, Yonghui

论文摘要

我们介绍了用于加速器的新型大型编排层的设计。我们的系统（Pathways）是明确设计的，旨在探索新系统和ML研究思想，同时保留当前模型的最新性能。途径使用碎片数据流图的异步运算符的碎片图图，该图表消耗和产生未来，并有效地对数千个加速器的黑帮成员进行异质平行计算，同时通过专用的互连协调数据传输。途径利用一种新型异步分布式数据流设计，尽管数据平面的依赖性，但可以使控制平面并行执行。通过仔细的工程设计，这种设计使途径可以采用单个控制器模型，从而使表达复杂的新的并行性模式变得更加容易。我们证明，在运行2048 TPU的SPMD计算时，通路可以实现性能均衡（约100％的加速器利用率），同时还可以提供与SPMD案例相当的吞吐量，这些吞吐量可与SPMD相当。

We present the design of a new large scale orchestration layer for accelerators. Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas, while retaining state of the art performance for current models. Pathways uses a sharded dataflow graph of asynchronous operators that consume and produce futures, and efficiently gang-schedules heterogeneous parallel computations on thousands of accelerators while coordinating data transfers over their dedicated interconnects. Pathways makes use of a novel asynchronous distributed dataflow design that lets the control plane execute in parallel despite dependencies in the data plane. This design, with careful engineering, allows Pathways to adopt a single-controller model that makes it easier to express complex new parallelism patterns. We demonstrate that Pathways can achieve performance parity (~100% accelerator utilization) with state-of-the-art systems when running SPMD computations over 2048 TPUs, while also delivering throughput comparable to the SPMD case for Transformer models that are pipelined across 16 stages, or sharded across two islands of accelerators connected over a data center network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题