最佳早期停止：过度信息与信息不足的参数化

论文标题

最佳早期停止：过度信息与信息不足的参数化

On Optimal Early Stopping: Over-informative versus Under-informative Parametrization

论文作者

Shen, Ruoqi, Gao, Liyao, Ma, Yi-An

论文摘要

早期停止是一种简单且广泛使用的方法，可以防止过度训练的神经网络。我们开发理论结果，以揭示某些线性模型的最佳早期停止时间和模型维度与数据集的样本大小之间的关系。我们的结果表明，当模型维度超过特征数量与相反方案时，两种截然不同的行为。尽管以前的大多数线性模型都集中在后一个设置上，但我们观察到该模型的维度通常超过了共同深度学习任务中数据引起的功能数量，并提出了一个模型来研究此设置。我们通过实验证明，我们对最佳早期停止时间的理论结果对应于深神网络的训练过程。

Early stopping is a simple and widely used method to prevent over-training neural networks. We develop theoretical results to reveal the relationship between the optimal early stopping time and model dimension as well as sample size of the dataset for certain linear models. Our results demonstrate two very different behaviors when the model dimension exceeds the number of features versus the opposite scenario. While most previous works on linear models focus on the latter setting, we observe that the dimension of the model often exceeds the number of features arising from data in common deep learning tasks and propose a model to study this setting. We demonstrate experimentally that our theoretical results on optimal early stopping time corresponds to the training process of deep neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题