论文标题
通过组件逐步剪辑来提高微调预审计语言模型的稳定性
Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping
论文作者
论文摘要
对大型语言模型(PLM)进行微调已经建立了许多最先进的结果。尽管表现出色,但这种微调可能是不稳定的,从而导致绩效和实际应用的潜在风险差异很大。以前的作品将这种不稳定归因于PLMS顶层的灾难性遗忘问题,这表明以自上而下的方式进行微调层是一个有希望的解决方案。在本文中,我们首先指出,由于不同层/模块的收敛速度不同,因此此方法并不总是可以解决的。受到这一观察的启发,我们提出了一种简单的组件梯度标准剪接方法,以调整不同组件的收敛速度。实验结果表明,我们的方法在概括性能,收敛速度和训练稳定性方面取得了一致的改进。代码库可以在https://github.com/yangalan123/finetuningstability上找到。
Fine-tuning over large pretrained language models (PLMs) has established many state-of-the-art results. Despite its superior performance, such fine-tuning can be unstable, resulting in significant variance in performance and potential risks for practical applications. Previous works have attributed such instability to the catastrophic forgetting problem in the top layers of PLMs, which indicates iteratively that fine-tuning layers in a top-down manner is a promising solution. In this paper, we first point out that this method does not always work out due to the different convergence speeds of different layers/modules. Inspired by this observation, we propose a simple component-wise gradient norm clipping method to adjust the convergence speed for different components. Experiment results demonstrate that our method achieves consistent improvements in terms of generalization performance, convergence speed, and training stability. The codebase can be found at https://github.com/yangalan123/FineTuningStability.