论文标题
双随机天然梯度下降和内部半空间梯度近似的收敛性
Dual Stochastic Natural Gradient Descent and convergence of interior half-space gradient approximations
论文作者
论文摘要
多项式逻辑回归(MLR)模型广泛用于统计和机器学习。随机梯度下降(SGD)是在大数据方案中确定MLR模型参数的最常见方法。但是,SGD的收敛速率缓慢。提高这些收敛速率的一种方法是使用歧管优化。沿着这条线,Amari提出的随机天然梯度下降(SNGD)在融合时被证明是有效的。但是,不能保证SNGD会收敛,并且对于具有大量参数的MLR模型而言,它在计算上太昂贵了。 在这里,我们基于多种优化概念提出了一种对MLR的随机优化方法,该方法(i)具有每片伯计算复杂性的参数数量是线性的,并且(ii)可以证明可以收敛。 为了实现(i)我们确定MLR的联合分布家族是双重平坦的多种流形,我们将其用于加快计算。 sánchez-lópez和Cerquides最近引入了收敛的随机天然梯度下降(CSNGD),这是SNGD的一种变体,其收敛是可以保证的。为了获得(ii)我们的算法使用CSNGD的基本思想,从而依靠独立的序列来构建自然梯度的有界近似。我们称之为所得算法双随机天然梯度下降(DNSGD)。通过概括Sunehag等人的结果,我们证明了DSNGD会收敛。此外,我们证明了DSNGD迭代的计算复杂性在模型的变量数量上是线性的。
The multinomial logistic regression (MLR) model is widely used in statistics and machine learning. Stochastic gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios. However, SGD has slow sub-linear rates of convergence. A way to improve these rates of convergence is to use manifold optimization. Along this line, stochastic natural gradient descent (SNGD), proposed by Amari, was proven to be Fisher efficient when it converged. However, SNGD is not guaranteed to converge and it is computationally too expensive for MLR models with a large number of parameters. Here, we propose a stochastic optimization method for MLR based on manifold optimization concepts which (i) has per-iteration computational complexity is linear in the number of parameters and (ii) can be proven to converge. To achieve (i) we establish that the family of joint distributions for MLR is a dually flat manifold and we use that to speed up calculations. Sánchez-López and Cerquides have recently introduced convergent stochastic natural gradient descent (CSNGD), a variant of SNGD whose convergence is guaranteed. To obtain (ii) our algorithm uses the fundamental idea from CSNGD, thus relying on an independent sequence to build a bounded approximation of the natural gradient. We call the resulting algorithm dual stochastic natural gradient descent (DNSGD). By generalizing a result from Sunehag et al., we prove that DSNGD converges. Furthermore, we prove that the computational complexity of DSNGD iterations are linear on the number of variables of the model.