论文标题

深度学习中强大缩放的情况:培训与混合平行性的大型3D CNN

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

论文作者

Oyama, Yosuke, Maruyama, Naoya, Dryden, Nikoli, McCarthy, Erin, Harrington, Peter, Balewski, Jan, Matsuoka, Satoshi, Nugent, Peter, Van Essen, Brian

论文摘要

我们提出了用于训练大规模3D卷积神经网络的可扩展混合平行算法。基于深度学习的新兴科学工作流程通常需要使用大型,高维样本的模型培训,这会使培训由于记忆过多而变得更加昂贵,甚至不可行。我们通过在整个端到端训练管道中广泛应用混合并行性来解决这些挑战,包括计算和I/O。我们的混合平行算法通过空间并行性扩展了标准数据并行性,该算法将单个样品分配在空间域中,从而实现了具有较大聚合存储器能力的迷你批次尺寸之外的强尺度。我们使用两个具有挑战性的3D CNN,Cosmoflow和3D U-NET评估了提议的培训算法。我们的全面绩效研究表明,对于使用2K GPU的两个网络,都可以实现良好的弱缩放率。更重要的是,我们能够对Cosmoflow进行比以前更大的样品培训,从而实现了预测准确性的缩写顺序提高。

We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make training much more costly and even infeasible due to excessive memory usage. We solve these challenges by extensively applying hybrid parallelism throughout the end-to-end training pipeline, including both computations and I/O. Our hybrid-parallel algorithm extends the standard data parallelism with spatial parallelism, which partitions a single sample in the spatial domain, realizing strong scaling beyond the mini-batch dimension with a larger aggregated memory capacity. We evaluate our proposed training algorithms with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive performance studies show that good weak and strong scaling can be achieved for both networks using up 2K GPUs. More importantly, we enable training of CosmoFlow with much larger samples than previously possible, realizing an order-of-magnitude improvement in prediction accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源