粒子双重平均：具有全局收敛速率分析的平均场神经网络的优化

论文标题

粒子双重平均：具有全局收敛速率分析的平均场神经网络的优化

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

论文作者

Nitanda, Atsushi, Wu, Denny, Suzuki, Taiji

论文摘要

我们提出了粒子双平均方法（PDA）方法，该方法将双重平均方法列为凸优化的双重平均方法，并具有具有定量运行时保证的优化对概率分布的优化。该算法由内部环和外环组成：内部环利用langevin算法大致求解固定分布，然后在外循环中进行了优化。因此，该方法可以解释为兰格文算法的扩展，以自然处理在概率空间上的非线性功能。所提出方法的一个重要应用是在平均野外制度中优化神经网络，由于存在非线性特征学习，理论上在理论上很有吸引力，但是定量收敛率可能具有挑战性。通过将有限维凸优化理论调整为措施空间，我们分析了正规化的经验 /预期风险最小化中的PDA，并在更一般的环境下学习两层平均场神经网络时建立定量的全球收敛性。我们的理论结果得到了尺寸合理的神经网络的数值模拟支持。

We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the outer loop. The method can thus be interpreted as an extension of the Langevin algorithm to naturally handle nonlinear functional on the probability space. An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain. By adapting finite-dimensional convex optimization theory into the space of measures, we analyze PDA in regularized empirical / expected risk minimization, and establish quantitative global convergence in learning two-layer mean field neural networks under more general settings. Our theoretical results are supported by numerical simulations on neural networks with reasonable size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题