论文标题
输入归一化的随机梯度下降训练深神经网络
Input Normalized Stochastic Gradient Descent Training of Deep Neural Networks
论文作者
论文摘要
在本文中,我们提出了一种用于训练机器学习模型的新型优化算法,称为归一化随机梯度下降(INSGD),灵感来自自适应滤波中使用的归一化最小平方正方形(NLMS)算法。当在大型数据集上训练复杂模型时,优化参数的选择,尤其是学习率,对于避免分歧至关重要。我们的算法使用$ \ ell_1 $和$ \ ell_2 $的正常化使用随机梯度下降来更新网络权重,类似于NLMS。但是,与现有的归一化方法不同,我们从归一化过程中排除了误差项,而是使用输入向量向神经元的输入向量进行标准化。我们的实验表明,与不同的初始化设置相比,我们的优化算法达到了更高的精度水平。我们使用RESNET-18,WRESNET-20,RESNET-50和玩具神经网络评估了基准数据集上培训算法的效率。我们的INSGD算法从92.42 \%提高RESNET-18的准确性从92.42 \%\%\%,CIFAR-100上的wresnet-20从76.20 \%\%\%\%\%\%\%\%\%,而Imagenet-50在Imagenet-1k上从75.52 \%到75.52 \%到75.67 \%。
In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our algorithm updates the network weights using stochastic gradient descent with $\ell_1$ and $\ell_2$-based normalizations applied to the learning rate, similar to NLMS. However, unlike existing normalization methods, we exclude the error term from the normalization process and instead normalize the update term using the input vector to the neuron. Our experiments demonstrate that our optimization algorithm achieves higher accuracy levels compared to different initialization settings. We evaluate the efficiency of our training algorithm on benchmark datasets using ResNet-18, WResNet-20, ResNet-50, and a toy neural network. Our INSGD algorithm improves the accuracy of ResNet-18 on CIFAR-10 from 92.42\% to 92.71\%, WResNet-20 on CIFAR-100 from 76.20\% to 77.39\%, and ResNet-50 on ImageNet-1K from 75.52\% to 75.67\%.