论文标题
对训练中的重尾的发生的经验研究
An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate
论文作者
论文摘要
关于随机深度学习算法的最新进展的一个特定方向是,即使数据分布不是这样,即使没有这些算法的固定分布的固定分布的相当神秘的重尾性质。此外,已知重尾索引对网络的输入维度,小批量大小和算法的步长显示出有趣的依赖性。在此简短说明中,我们对S.G.D.该指数进行了一项实验研究。在培训$ \ relu $ gate(在可实现的和二进制分类设置中)和S.G.D的变体时。这在Karmakar和Mukherjee(2022)中被证明是可恢复可实现的数据。从我们的实验中,我们猜想这两种算法在可以证明可以融合的任何数据上具有相似的重尾行为。其次,我们证明,在此模型方案中,较晚时间的重尾指数与线性假设类别证明的属性或以前证明的大型网络具有明显不同的属性。
A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a $\relu$ gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time iterates in this model scenario has strikingly different properties than either what has been proven for linear hypothesis classes or what has been previously demonstrated for large nets.