论文标题
关于深神经网络混合培训的校准
On Calibration of Mixup Training for Deep Neural Networks
论文作者
论文摘要
深度神经网络(DNN)在许多任务中代表着最新的现状。但是,由于其过度参数化,它们的概括能力令人怀疑,并且仍在研究中。因此,DNN可以过度拟合并分配过度自信的预测 - 已证明会影响分配给看不见数据的信心的校准。已经提出了数据增强(DA)策略来使这些模型正规化,这是最受欢迎的模型之一,因为它可以提高DNN的准确性,不确定性定量和校准。然而,在这项工作中,我们认为并提供了经验证据,即由于其基本面,混合不一定会改善校准。根据我们的观察,我们提出了一种新的损失函数,可改善使用此DA技术训练的DNN的校准,有时还可以提高精度。我们的损失受贝叶斯决策理论的启发,并引入了一个新的培训框架,以设计概率建模的损失。我们提供最先进的精度,并持续改进校准性能。附录和代码在这里提供:https://github.com/jmaronas/calibration_mixupdnn_arcloss.pytorch.git
Deep Neural Networks (DNN) represent the state of the art in many tasks. However, due to their overparameterization, their generalization capabilities are in doubt and still a field under study. Consequently, DNN can overfit and assign overconfident predictions -- effects that have been shown to affect the calibration of the confidences assigned to unseen data. Data Augmentation (DA) strategies have been proposed to regularize these models, being Mixup one of the most popular due to its ability to improve the accuracy, the uncertainty quantification and the calibration of DNN. In this work however we argue and provide empirical evidence that, due to its fundamentals, Mixup does not necessarily improve calibration. Based on our observations we propose a new loss function that improves the calibration, and also sometimes the accuracy, of DNN trained with this DA technique. Our loss is inspired by Bayes decision theory and introduces a new training framework for designing losses for probabilistic modelling. We provide state-of-the-art accuracy with consistent improvements in calibration performance. Appendix and code are provided here: https://github.com/jmaronas/calibration_MixupDNN_ARCLoss.pytorch.git