论文标题
重球方法的平均场分析:辍学稳定性,连通性和全球收敛性
Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence
论文作者
论文摘要
随机重球法(SHB),也称为具有Polyak动量的随机梯度下降(SGD),被广泛用于训练神经网络中。但是,尽管这种算法在实践中取得了显着的成功,但其理论表征仍然有限。在本文中,我们专注于具有两层和三层的神经网络,并对SHB的解决方案的特性进行了严格的了解:\ emph {(i)}稳定性在掉落部分神经元后的稳定性,\ emph {(ii)}沿着低损坏路径的连接性,以及\ emph {(iii)} convertimilife the Global the Global the Global。为了实现这一目标,我们采用平均视野视图,并将SHB动力学与大型网络宽度极限的特定部分微分方程相关联。这种平均场景的观点激发了最近的工作,重点是SGD,相比之下,我们的论文考虑了一种动力的算法。更具体地说,在证明了极限微分方程的存在和唯一性之后,我们显示了融合到全局最优值,并在有限宽度网络的均值范围限制和SHB动力学之间给出了定量界限。有了最后的界限,我们能够建立SHB解决方案的辍学稳定性和连接性。
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: \emph{(i)} stability after dropping out part of the neurons, \emph{(ii)} connectivity along a low-loss path, and \emph{(iii)} convergence to the global optimum. To achieve this goal, we take a mean-field view and relate the SHB dynamics to a certain partial differential equation in the limit of large network widths. This mean-field perspective has inspired a recent line of work focusing on SGD while, in contrast, our paper considers an algorithm with momentum. More specifically, after proving existence and uniqueness of the limit differential equations, we show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network. Armed with this last bound, we are able to establish the dropout-stability and connectivity of SHB solutions.