论文标题

流迭代迭代分布式编码计算,用于在异质系统中学习应用

Stream Iterative Distributed Coded Computing for Learning Applications in Heterogeneous Systems

论文作者

Esfahanizadeh, Homa, Cohen, Alejandro, Medard, Muriel

论文摘要

为了改善学习应用程序的效用和使机器学习解决方案可行的复杂应用程序,需要大量的重量计算。因此,必须将计算委派给几位工人至关重要,这带来了应对系统异质性和不确定性造成的延迟和失败的主要挑战。特别是,将端到端作业的端到端执行延迟(从到达到交付)对于实际延迟敏感的应用程序至关重要。在本文中,为了在随机异质分布式系统中计算每个作业迭代,工人在计算和交流权力方面有所不同,我们提出了一个新颖的关节调度编码框架,该框架在工人之间最佳地分配了编码的计算负载。这缩小了工人的响应时间之间的差距,对于最大化资源利用率至关重要。为了进一步减少固定执行延迟,我们还将分布式计算作业的每次迭代中的冗余计算纳入了。我们的仿真结果表明,使用所提出的解决方案获得的延迟大大低于均匀分裂,该分裂尚未理解该系统的异质性,实际上,仅通过引入少量的冗余计算而非常接近理想的下限。

To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several workers, which brings up the major challenge of coping with delays and failures caused by the system's heterogeneity and uncertainties. In particular, minimizing the end-to-end job in-order execution delay, from arrival to delivery, is of great importance for real-world delay-sensitive applications. In this paper, for computation of each job iteration in a stochastic heterogeneous distributed system where the workers vary in their computing and communicating powers, we present a novel joint scheduling-coding framework that optimally split the coded computational load among the workers. This closes the gap between the workers' response time, and is critical to maximize the resource utilization. To further reduce the in-order execution delay, we also incorporate redundant computations in each iteration of a distributed computational job. Our simulation results demonstrate that the delay obtained using the proposed solution is dramatically lower than the uniform split which is oblivious to the system's heterogeneity and, in fact, is very close to an ideal lower bound just by introducing a small percentage of redundant computations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源