论文标题
距离更大的性能较差:从层利用和模型概括的角度来看
With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization
论文作者
论文摘要
深度神经网络的概括仍然是机器学习的主要开放问题之一。以前的理论工作着重于导致模型复杂性的紧密界限,而经验工作表明,关于训练样本计数和神经网络大小,神经网络均表现出双重下降。在本文中,我们经验研究了神经网络的不同层次对模型的贡献不同。我们发现,早期层通常学习与培训数据和测试数据的性能相关的表示形式。相反,更深的层仅最大程度地降低培训风险,并且无法通过测试或标签错误的数据概括地概括。我们进一步说明了训练的权重与最终层的初始值的距离与概括错误具有很高的相关性,并且可以作为模型过度的指标。此外,我们展示了通过重新定位最终层的权重来支持训练后正则化的证据。我们的发现提供了一种有效的方法来估计神经网络的概括能力,并且这些定量结果的见解可能会激发推导衍生的更好的泛化界限,从而考虑了神经网络的内部结构。
Generalization of deep neural networks remains one of the main open problems in machine learning. Previous theoretical works focused on deriving tight bounds of model complexity, while empirical works revealed that neural networks exhibit double descent with respect to both training sample counts and the neural network size. In this paper, we empirically examined how different layers of neural networks contribute differently to the model; we found that early layers generally learn representations relevant to performance on both training data and testing data. Contrarily, deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data. We further illustrate the distance of trained weights to its initial value of final layers has high correlation to generalization errors and can serve as an indicator of an overfit of model. Moreover, we show evidence to support post-training regularization by re-initializing weights of final layers. Our findings provide an efficient method to estimate the generalization capability of neural networks, and the insight of those quantitative results may inspire derivation to better generalization bounds that take the internal structure of neural networks into consideration.