论文标题

扩展的无约束功能模型,用于探索深层神经崩溃

Extended Unconstrained Features Model for Exploring Deep Neural Collapse

论文作者

Tirer, Tom, Bruna, Joan

论文摘要

训练深层神经网络进行分类任务的现代策略包括优化网络的权重,即使训练错误消失了,以进一步将训练损失推向零。最近,在此训练程序中已经观察到了一种称为“神经崩溃”(NC)的现象。具体而言,已经表明,课堂内样品的学习特征(倒数第二层的输出)融合到它们的均值,而不同类别的平均值表现出一定的紧密框架结构,这也与最后一层的重量对齐。最近的论文表明,在优化简化的“不受约束的特征模型”(UFM)时,具有这种结构的最小化器具有正则化的交叉渗透损失。在本文中,我们进一步分析并扩展了UFM。首先,我们研究了正规化MSE损失的UFM,并表明最小化的特征比跨透明拷贝的结构更精致。这也影响了权重的结构。然后,我们通过向模型添加另一层权重以及依赖非线性来扩展UFM并概括了我们先前的结果。最后,我们从经验上证明了非线性扩展UFM在对实用网络发生的NC现象进行建模时的实用性。

The modern strategy for training deep neural networks for classification tasks includes optimizing the network's weights even after the training error vanishes to further push the training loss toward zero. Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in this training procedure. Specifically, it has been shown that the learned features (the output of the penultimate layer) of within-class samples converge to their mean, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer's weights. Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model" (UFM) with a regularized cross-entropy loss. In this paper, we further analyze and extend the UFM. First, we study the UFM for the regularized MSE loss, and show that the minimizers' features can have a more delicate structure than in the cross-entropy case. This affects also the structure of the weights. Then, we extend the UFM by adding another layer of weights as well as ReLU nonlinearity to the model and generalize our previous results. Finally, we empirically demonstrate the usefulness of our nonlinear extended UFM in modeling the NC phenomenon that occurs with practical networks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源