论文标题

基于新近似的特征值校正的天然梯度

Eigenvalue-corrected Natural Gradient Based on a New Approximation

论文作者

Gao, Kai-Xin, Liu, Xiao-Lei, Huang, Zheng-Hai, Wang, Min, Wang, Shuangling, Wang, Zidong, Xu, Dachuan, Yu, Fan

论文摘要

使用二阶优化方法进行训练深度神经网络(DNN)吸引了许多研究人员。最近提出的方法,特征值校正后的Kronecker分解(EKFAC)(George等,2018)提出了一种将自然梯度更新视为对角线方法的解释,并纠正了Kronecker类型的Eigenbasis中的不准确的重新刻度因子。 Gao等。 (2020年)考虑了自然梯度的新近似值,该近似值将Fisher Information矩阵(FIM)近似于恒定乘以两个矩阵的Kronecker乘积,并在近似之前和之后保持迹线相等。在这项工作中,我们结合了这两种方法的思想,并提出了痕量限制的特征值校正的Kronecker分解(TEKFAC)。所提出的方法不仅纠正了Kronecker因本征egenbasis下的不精确重新缩放因子,而且还考虑了Gao等人提出的新近似方法和有效的阻尼技术。 (2020)。我们还讨论了克罗内克因近似值之间的差异和关系。从经验上讲,我们的方法在几个DNN上都优于动量,Adam,Ekfac和TKFAC的SGD。

Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC) (George et al., 2018), proposes an interpretation of viewing natural gradient update as a diagonal method, and corrects the inaccurate re-scaling factor in the Kronecker-factored eigenbasis. Gao et al. (2020) considers a new approximation to the natural gradient, which approximates the Fisher information matrix (FIM) to a constant multiplied by the Kronecker product of two matrices and keeps the trace equal before and after the approximation. In this work, we combine the ideas of these two methods and propose Trace-restricted Eigenvalue-corrected Kronecker Factorization (TEKFAC). The proposed method not only corrects the inexact re-scaling factor under the Kronecker-factored eigenbasis, but also considers the new approximation method and the effective damping technique proposed in Gao et al. (2020). We also discuss the differences and relationships among the Kronecker-factored approximations. Empirically, our method outperforms SGD with momentum, Adam, EKFAC and TKFAC on several DNNs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源