论文标题

与CPU协作的简单算法的混合多GPU实现

A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration

论文作者

Mamalis, Basilis, Perlitis, Marios

论文摘要

单纯算法已成功用于求解线性编程(LP)问题。由于需要进行密集的计算(尤其是对于解决大型LP问题的解决方案),因此还对平行方法进行了广泛的研究。在过去几年中,现代GPU提供的计算能力以及多核CPU系统的快速开发使OpenMP和CUDA编程模型成为了最佳偏好。但是,通过上述编程模型的结合使用,CPU和GPU之间所需的有效合作仍然被认为是一个艰巨的研究问题。在上述情况下,我们在这里证明了标准单纯胶的过度有效实现,以最佳利用所有计算资源,在具有多个启用CUDA的GPU的多核平台上同时使用所有计算资源。更具体地说,我们提出了一种新型的混合协作方案,该方案基于适当分布的CPU分配(通过多线程)和GPU-Owt载荷计算的同时执行。通过合作使用OpenMP和CUDA在一个尤其强大的现代混合动力平台(由32个核心和两个高规格GPU组成,Titan RTX和RTX 2080TI)中提取的实验结果强调,此处介绍的表现显然是Hybrid GPU/CPU协作方案,显然是GPU-GPU-necessions的实施。相应的测量结果即使在多GPU配置平台的情况下,也可以同时使用所有资源的值验证。此外,给定的实现与参考书目中的其他相关尝试完全可比较(并且在大多数情况下略有优越),显然比32个核心优于天然CPU实施。

The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also extensively been studied. The computational power provided by the modern GPUs as well as the rapid development of multicore CPU systems have led OpenMP and CUDA programming models to the top preferences during the last years. However, the desired efficient collaboration between CPU and GPU through the combined use of the above programming models is still considered a hard research problem. In the above context, we demonstrate here an excessively efficient implementation of standard simplex, targeting to the best possible exploitation of the concurrent use of all the computing resources, on a multicore platform with multiple CUDA-enabled GPUs. More concretely, we present a novel hybrid collaboration scheme which is based on the concurrent execution of suitably spread CPU-assigned (via multithreading) and GPU-offloaded computations. The experimental results extracted through the cooperative use of OpenMP and CUDA over a notably powerful modern hybrid platform (consisting of 32 cores and two high-spec GPUs, Titan Rtx and Rtx 2080Ti) highlight that the performance of the presented here hybrid GPU/CPU collaboration scheme is clearly superior to the GPU-only implementation under almost all conditions. The corresponding measurements validate the value of using all resources concurrently, even in the case of a multi-GPU configuration platform. Furthermore, the given implementations are completely comparable (and slightly superior in most cases) to other related attempts in the bibliography, and clearly superior to the native CPU-implementation with 32 cores.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源