通过中间层简化最近的类中心

论文标题

通过中间层简化最近的类中心

Nearest Class-Center Simplification through Intermediate Layers

论文作者

Ben-Shaul, Ido, Dekel, Shai

论文摘要

理论深度学习的最新进展引入了训练过程中发生的几何特性，超过了插值阈值 - 训练误差达到零。我们询问了网络中间层中的神经塌陷，并强调了深网内部最近的中心不匹配的内部工作。我们进一步表明，这些过程既出现在视觉和语言模型架构中。最后，我们提出了一个随机变化损失（SVSL），该损失（SVSL）鼓励中间层中更好的几何特征，并改善了火车指标和概括。

Recent advances in theoretical Deep Learning have introduced geometric properties that occur during training, past the Interpolation Threshold -- where the training error reaches zero. We inquire into the phenomena coined Neural Collapse in the intermediate layers of the networks, and emphasize the innerworkings of Nearest Class-Center Mismatch inside the deepnet. We further show that these processes occur both in vision and language model architectures. Lastly, we propose a Stochastic Variability-Simplification Loss (SVSL) that encourages better geometrical features in intermediate layers, and improves both train metrics and generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题