通过梯度重新加权对峰值神经网络进行暂时性培训

论文标题

通过梯度重新加权对峰值神经网络进行暂时性培训

Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting

论文作者

Deng, Shikuang, Li, Yuhang, Zhang, Shanghang, Gu, Shi

论文摘要

最近，由于其事件驱动和节能的特征，受脑启发的尖峰神经元网络（SNN）引起了广泛的研究兴趣。尽管如此，由于其激活函数的非差异性，很难有效地训练深SNN，这可以禁用传统人工神经网络（ANN）的典型使用梯度下降方法。尽管替代梯度（SG）的采用正式允许损失的后传播，但离散的尖峰机制实际上将SNN的损失格局与ANN的损失格局区分开，使替代梯度方法失败，无法获得与ANN相当的准确性。在本文中，我们首先分析了为什么当前具有替代梯度的直接训练方法导致SNN的概括性差。然后，我们介绍了时间效率训练（TET）方法，以补偿使用SG梯度下降中动量损失，以便训练过程可以以更好的概括性将训练过程收敛为扁平的最小值。同时，我们证明TET提高了SNN的时间可伸缩性，并引起了随时间遗传的加速训练。我们的方法始终优于所有报告的主流数据集（包括CIFAR-10/100和Imagenet）的SOTA。值得注意的是，在DVS-CIFAR10上，我们获得了83 $ \％$ top-1的精度，与现有最新状态相比，超过10 $ \％$的改进。代码可在\ url {https://github.com/gus-lab/temporal_effficity_training}中获得。

Recently, brain-inspired spiking neuron networks (SNNs) have attracted widespread research interest because of their event-driven and energy-efficient characteristics. Still, it is difficult to efficiently train deep SNNs due to the non-differentiability of its activation function, which disables the typically used gradient descent approaches for traditional artificial neural networks (ANNs). Although the adoption of surrogate gradient (SG) formally allows for the back-propagation of losses, the discrete spiking mechanism actually differentiates the loss landscape of SNNs from that of ANNs, failing the surrogate gradient methods to achieve comparable accuracy as for ANNs. In this paper, we first analyze why the current direct training approach with surrogate gradient results in SNNs with poor generalizability. Then we introduce the temporal efficient training (TET) approach to compensate for the loss of momentum in the gradient descent with SG so that the training process can converge into flatter minima with better generalizability. Meanwhile, we demonstrate that TET improves the temporal scalability of SNN and induces a temporal inheritable training for acceleration. Our method consistently outperforms the SOTA on all reported mainstream datasets, including CIFAR-10/100 and ImageNet. Remarkably on DVS-CIFAR10, we obtained 83$\%$ top-1 accuracy, over 10$\%$ improvement compared to existing state of the art. Codes are available at \url{https://github.com/Gus-Lab/temporal_efficient_training}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题