GMI-DRL：使用GPU空间多路复用来授权多GPU深钢筋学习

论文标题

GMI-DRL：使用GPU空间多路复用来授权多GPU深钢筋学习

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

论文作者

Wang, Yuke, Feng, Boyuan, Wang, Zheng, Geng, Tong, Li, Ang, Ding, Yufei

论文摘要

随着机器人技术在工业控制和自动驾驶中的普及，深度强化学习（DRL）引起了各个领域的关注。但是，由于其异质工作负载和交错执行范式，现代强大的GPU平台上的DRL计算仍然效率低下。为此，我们提出了GMI-DRL，这是一种系统的设计，可以通过GPU空间多路复用来加速多GPU DRL。我们介绍了一种新颖的设计设计，可用于资源可调的GPU多路复用实例（GMI），以符合DRL任务的实际需求，一种适应性的GMI管理策略，以同时实现高GPU利用率和计算吞吐量，以及高效的GMI间沟通支持，以满足各种DRL通信模式的需求。综合实验表明，GMI-DRL的表现优于NCCL（高达2.81倍）和Horovod（高达2.34倍）的最先进的ISAAC体育馆，在最新的DGX-A100平台上的培训吞吐量中的支持。我们的工作为GPU空间多路复用提供了初步的用户体验，以与计算和通信混合处理异质工作负载。

With the increasing popularity of robotics in industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attention of various fields. However, DRL computation on the modern powerful GPU platform is still inefficient due to its heterogeneous workloads and interleaved execution paradigm. To this end, we propose GMI-DRL, a systematic design to accelerate multi-GPU DRL via GPU spatial multiplexing. We introduce a novel design of resource-adjustable GPU multiplexing instances (GMIs) to match the actual needs of DRL tasks, an adaptive GMI management strategy to simultaneously achieve high GPU utilization and computation throughput, and a highly efficient inter-GMI communication support to meet the demands of various DRL communication patterns. Comprehensive experiments reveal that GMI-DRL outperforms state-of-the-art NVIDIA Isaac Gym with NCCL (up to 2.81X) and Horovod (up to 2.34X) support in training throughput on the latest DGX-A100 platform. Our work provides an initial user experience with GPU spatial multiplexing in processing heterogeneous workloads with a mixture of computation and communication.

下载PDF全文

下载文献需遵守相关版权规定

论文标题