POF：用于改善概括的特征提取器的训练后训练

论文标题

POF：用于改善概括的特征提取器的训练后训练

PoF: Post-Training of Feature Extractor for Improving Generalization

论文作者

Sato, Ikuro, Yamada, Ryota, Tanaka, Masayuki, Inoue, Nakamasa, Kawakami, Rei

论文摘要

经过深入的研究，最低限度的损失景观的局部形状，尤其是平坦度对于深层模型的概括起重要作用。我们开发了一种称为POF的培训算法：特征提取器的训练后培训，该算法更新了已经训练的深层模型的特征提取器部分以搜索最小的最小值。特征是两倍：1）特征提取器在高层参数空间中的参数扰动下进行训练，基于观察结果，表明表明较高层参数空间的观察值和2）2）2）扰动范围以数据驱动的方式确定，旨在旨在减少正损失损失损失的一部分。我们提供了理论分析，该分析表明所提出的算法隐含地减少了目标Hessian组件以及损失。实验结果表明，POF仅针对CIFAR-10和CIFAR-100数据集的基线方法提高了模型性能，仅用于10个上学后培训，以及用于50个上述后培训的SVHN数据集。源代码可在：\ url {https://github.com/densoitlab/pof-v1

It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feature extractor is trained under parameter perturbations in the higher-layer parameter space, based on observations that suggest flattening higher-layer parameter space, and 2) the perturbation range is determined in a data-driven manner aiming to reduce a part of test loss caused by the positive loss curvature. We provide a theoretical analysis that shows the proposed algorithm implicitly reduces the target Hessian components as well as the loss. Experimental results show that PoF improved model performance against baseline methods on both CIFAR-10 and CIFAR-100 datasets for only 10-epoch post-training, and on SVHN dataset for 50-epoch post-training. Source code is available at: \url{https://github.com/DensoITLab/PoF-v1

下载PDF全文

下载文献需遵守相关版权规定

论文标题