论文标题
通过中间层简化最近的类中心
Nearest Class-Center Simplification through Intermediate Layers
论文作者
论文摘要
理论深度学习的最新进展引入了训练过程中发生的几何特性,超过了插值阈值 - 训练误差达到零。我们询问了网络中间层中的神经塌陷,并强调了深网内部最近的中心不匹配的内部工作。我们进一步表明,这些过程既出现在视觉和语言模型架构中。最后,我们提出了一个随机变化损失(SVSL),该损失(SVSL)鼓励中间层中更好的几何特征,并改善了火车指标和概括。
Recent advances in theoretical Deep Learning have introduced geometric properties that occur during training, past the Interpolation Threshold -- where the training error reaches zero. We inquire into the phenomena coined Neural Collapse in the intermediate layers of the networks, and emphasize the innerworkings of Nearest Class-Center Mismatch inside the deepnet. We further show that these processes occur both in vision and language model architectures. Lastly, we propose a Stochastic Variability-Simplification Loss (SVSL) that encourages better geometrical features in intermediate layers, and improves both train metrics and generalization.