论文标题
建立基于Java的高性能深度学习框架
Towards High Performance Java-based Deep Learning Frameworks
论文作者
论文摘要
现代云服务的出现以及每天生产的大量数据,已经设定了对快速有效的数据处理的需求。在众多应用领域(例如深度学习,数据挖掘和计算机视觉)中,这种需求很普遍。先前的研究重点是采用硬件加速器来克服这种低效率。这种趋势驱动了软件开发以靶向异质执行,并且几种现代计算系统结合了包括GPU和FPGA在内的各种计算组件的混合物。但是,应用程序代码的异质执行代码并不是一项琐碎的任务,因为它要求开发人员拥有硬件专业知识以获得高性能。支持异质加速的绝大多数现有深度学习框架依赖于从高级编程语言到低级加速器后端的包装器调用,例如OpenCL,CUDA或HLS。 在本文中,我们采用了Tornadovm,这是一种最先进的异质编程框架,以透明地加速深净净值。一个基于Java的深度学习框架。我们的最初结果表明,在执行网络对AMD GPU的培训的背面传播过程中,与原始Deep Netts框架的顺序执行时,最高的性能加速。
The advent of modern cloud services along with the huge volume of data produced on a daily basis, have set the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. Prior research has focused on employing hardware accelerators as a means to overcome this inefficiency. This trend has driven software development to target heterogeneous execution, and several modern computing systems have incorporated a mixture of diverse computing components, including GPUs and FPGAs. However, the specialization of the applications' code for heterogeneous execution is not a trivial task, as it requires developers to have hardware expertise in order to obtain high performance. The vast majority of the existing deep learning frameworks that support heterogeneous acceleration, rely on the implementation of wrapper calls from a high-level programming language to a low-level accelerator backend, such as OpenCL, CUDA or HLS. In this paper we have employed TornadoVM, a state-of-the-art heterogeneous programming framework to transparently accelerate Deep Netts; a Java-based deep learning framework. Our initial results demonstrate up to 8x performance speedup when executing the back propagation process of the network's training on AMD GPUs against the sequential execution of the original Deep Netts framework.