PEP：参数通过扰动结合

论文标题

PEP：参数通过扰动结合

PEP: Parameter Ensembling by Perturbation

论文作者

Mehrtash, Alireza, Abolmaesumi, Purang, Golland, Polina, Kapur, Tina, Wassermann, Demian, Wells III, William M.

论文摘要

现在将结合视为提高深网的预测性能和校准的有效方法。我们引入了一种新方法，即通过扰动（PEP）结合参数，该方法将参数值的集合构造为从具有单个方差参数的高斯训练设置的最佳参数的随机扰动。选择方差以最大化验证数据集上集合平均值（$ \ mathbb {l} $）的log样本。从经验上讲，也许令人惊讶的是，随着差异的增长，$ \ mathbb {l} $具有明确的最大值（与基线模型相对应）。方便的，预测的校准水平也趋于增长，直到达到$ \ mathbb {l} $的峰值为止。在大多数实验中，PEP在性能方面提供了较小的改善，在某些情况下，PEP的经验校准大大改善。我们表明，这种“ PEP效应”（对数可能的增益）与似然函数和经验渔民信息的平均曲率有关。在包括ResNet，Densenet和Inception在内的Imagenet预训练网络上进行的实验显示出改善的校准和可能性。我们进一步观察到这些网络上的分类准确性有所提高。分类基准（例如MNIST和CIFAR-10）的实验表现出改善的校准和可能性，以及PEP效应与过度拟合之间的关系。这表明PEP可用于探测训练期间发生的过度拟合水平。通常，不需要特殊的培训程序或网络体系结构，在预训练的网络的情况下，不需要额外的培训。

Ensembling is now recognized as an effective approach for increasing the predictive performance and calibration of deep networks. We introduce a new approach, Parameter Ensembling by Perturbation (PEP), that constructs an ensemble of parameter values as random perturbations of the optimal parameter set from training by a Gaussian with a single variance parameter. The variance is chosen to maximize the log-likelihood of the ensemble average ($\mathbb{L}$) on the validation data set. Empirically, and perhaps surprisingly, $\mathbb{L}$ has a well-defined maximum as the variance grows from zero (which corresponds to the baseline model). Conveniently, calibration level of predictions also tends to grow favorably until the peak of $\mathbb{L}$ is reached. In most experiments, PEP provides a small improvement in performance, and, in some cases, a substantial improvement in empirical calibration. We show that this "PEP effect" (the gain in log-likelihood) is related to the mean curvature of the likelihood function and the empirical Fisher information. Experiments on ImageNet pre-trained networks including ResNet, DenseNet, and Inception showed improved calibration and likelihood. We further observed a mild improvement in classification accuracy on these networks. Experiments on classification benchmarks such as MNIST and CIFAR-10 showed improved calibration and likelihood, as well as the relationship between the PEP effect and overfitting; this demonstrates that PEP can be used to probe the level of overfitting that occurred during training. In general, no special training procedure or network architecture is needed, and in the case of pre-trained networks, no additional training is needed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题