论文标题

沟通高效的分布式随机AUC最大化与深神经网络

Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks

论文作者

Guo, Zhishuai, Liu, Mingrui, Yuan, Zhuoning, Shen, Li, Liu, Wei, Yang, Tianbao

论文摘要

在本文中,我们研究了以深层神经网络作为预测模型的大规模AUC最大化的分布算法。尽管在深度学习中已经对分布式学习技术进行了广泛的研究,但由于其与标准损失最小化问题的明显差异(例如,交叉透视),它们并不直接适用于随机的AUC最大化,并具有深层神经网络的最大化。为了应对这一挑战,我们提出和分析了基于AUC最大化的{\ it non-convex凹面}的沟通效率分布式优化算法,在每个工人的渐变更新后仅在每个工人的渐变更新后才出现每个工人和参数之间的原始变量和双重变量的通信。与现有算法的幼稚并行版本相比,该算法在单个机器上计算随机梯度并平均它们以更新模型参数,我们的算法需要少得多的通信回合,并且在理论上仍然可以实现线性加速。据我们所知,这是\ textbf {first}的工作,它解决了{\ it non-convex凹入min-max}问题,即以深层神经网络以通信有效的分布方式进行AUC最大化,同时仍保持理论中的线性加速属性。我们在几个基准数据集上的实验显示了我们算法的有效性,并确认了我们的理论。

In this paper, we study distributed algorithms for large-scale AUC maximization with a deep neural network as a predictive model. Although distributed learning techniques have been investigated extensively in deep learning, they are not directly applicable to stochastic AUC maximization with deep neural networks due to its striking differences from standard loss minimization problems (e.g., cross-entropy). Towards addressing this challenge, we propose and analyze a communication-efficient distributed optimization algorithm based on a {\it non-convex concave} reformulation of the AUC maximization, in which the communication of both the primal variable and the dual variable between each worker and the parameter server only occurs after multiple steps of gradient-based updates in each worker. Compared with the naive parallel version of an existing algorithm that computes stochastic gradients at individual machines and averages them for updating the model parameters, our algorithm requires a much less number of communication rounds and still achieves a linear speedup in theory. To the best of our knowledge, this is the \textbf{first} work that solves the {\it non-convex concave min-max} problem for AUC maximization with deep neural networks in a communication-efficient distributed manner while still maintaining the linear speedup property in theory. Our experiments on several benchmark datasets show the effectiveness of our algorithm and also confirm our theory.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源