论文标题
深层网络的基于距离的正规化用于微调
Distance-Based Regularisation of Deep Networks for Fine-Tuning
论文作者
论文摘要
我们研究深度神经网络进行微调过程中的正则化方法。首先,我们提供了基于Rademacher复杂性绑定的神经网络概括,该复杂性使用权重从其初始值移动的距离。该界限没有直接依赖权重的数量,并且在应用于卷积网络时与其他界限进行了比较。我们的界限与微调高度相关,因为提供基于转移学习的良好初始化的网络意味着学习可以减少权重,从而实现更严格的概括。受此启发,我们开发了一种简单而有效的微调算法,该算法将假设类别限制为以初始预训练的重量为中心的小球体,从而获得了比常规转移学习获得的概括性能更好。经验评估表明,我们的算法效果很好,证实了我们的理论结果。它的表现优于最先进的微调竞争对手,我们显示的基于罚款的替代方案并不直接限制搜索空间的半径。
We investigate approaches to regularisation during fine-tuning of deep neural networks. First we provide a neural network generalisation bound based on Rademacher complexity that uses the distance the weights have moved from their initial values. This bound has no direct dependence on the number of weights and compares favourably to other bounds when applied to convolutional networks. Our bound is highly relevant for fine-tuning, because providing a network with a good initialisation based on transfer learning means that learning can modify the weights less, and hence achieve tighter generalisation. Inspired by this, we develop a simple yet effective fine-tuning algorithm that constrains the hypothesis class to a small sphere centred on the initial pre-trained weights, thus obtaining provably better generalisation performance than conventional transfer learning. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art fine-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space.