三层神经网络的SGD分布动力学

论文标题

三层神经网络的SGD分布动力学

SGD Distributional Dynamics of Three Layer Neural Networks

论文作者

Luo, Victor, Wang, Yazhen, Fung, Glenn

论文摘要

随着大数据分析的兴起，多层神经网络已成为最强大的机器学习方法之一。但是，他们的理论数学属性仍然尚未完全理解。训练神经网络需要优化非凸目标函数，通常使用随机梯度下降（SGD）进行。在本文中，我们试图扩展Mei等人的平均磁场结果。（2018）从具有一个隐藏层的两层神经网络到具有两个隐藏层的三层神经网络。我们将证明SGD动力学是由一组非线性偏微分方程捕获的，并证明了两个隐藏层中权重的分布是独立的。我们还将详细介绍基于仿真和现实数据完成的探索工作。

With the rise of big data analytics, multi-layer neural networks have surfaced as one of the most powerful machine learning methods. However, their theoretical mathematical properties are still not fully understood. Training a neural network requires optimizing a non-convex objective function, typically done using stochastic gradient descent (SGD). In this paper, we seek to extend the mean field results of Mei et al. (2018) from two-layer neural networks with one hidden layer to three-layer neural networks with two hidden layers. We will show that the SGD dynamics is captured by a set of non-linear partial differential equations, and prove that the distributions of weights in the two hidden layers are independent. We will also detail exploratory work done based on simulation and real-world data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题