论文标题
LQF:线性二次微调
LQF: Linear Quadratic Fine-Tuning
论文作者
论文摘要
在其参数上线性的分类器,并且通过优化凸损耗函数进行培训,在训练数据,初始条件和优化的变化方面具有可预测的行为。在深层神经网络(DNN)中,这种理想的特性通常是通过预训练模型的非线性微调训练的。以前的线性化DNN的尝试导致了有趣的理论见解,但由于与标准的非线性优化相比,由于较大的性能差距而没有影响实践。我们提出了第一种线性化的预训练模型的方法,该模型可以在大多数现实世界图像分类任务上实现可比的性能与非线性微调,从而享受线性模型的解释性,而不会导致惩罚性能损失。 LQF由通常用于分类的体系结构,损耗函数和优化的简单修改:泄漏 - relu而不是relu,平均平方损失而不是跨凝聚,以及使用Kronecker cractization进行预处理。这些孤立变化都不足以接近非线性微调的性能。当组合使用时,它们使我们能够达到可比性的性能,甚至在低数据表方面甚至优越,同时享受线性界限优化的简单性,鲁棒性和解释性。
Classifiers that are linear in their parameters, and trained by optimizing a convex loss function, have predictable behavior with respect to changes in the training data, initial conditions, and optimization. Such desirable properties are absent in deep neural networks (DNNs), typically trained by non-linear fine-tuning of a pre-trained model. Previous attempts to linearize DNNs have led to interesting theoretical insights, but have not impacted the practice due to the substantial performance gap compared to standard non-linear optimization. We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning on most of real-world image classification tasks tested, thus enjoying the interpretability of linear models without incurring punishing losses in performance. LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification: Leaky-ReLU instead of ReLU, mean squared loss instead of cross-entropy, and pre-conditioning using Kronecker factorization. None of these changes in isolation is sufficient to approach the performance of non-linear fine-tuning. When used in combination, they allow us to reach comparable performance, and even superior in the low-data regime, while enjoying the simplicity, robustness and interpretability of linear-quadratic optimization.