针对命名实体识别的基于变压器的语言模型的自适应微调

论文标题

针对命名实体识别的基于变压器的语言模型的自适应微调

Adaptive Fine-Tuning of Transformer-Based Language Models for Named Entity Recognition

论文作者

Stollenwerk, Felix

论文摘要

当前的基于微调变压器语言模型的标准方法包括固定数量的培训时期和线性学习率计划。为了获得给定下游任务的近乎最佳模型，通常需要进行优化超级参数空间的搜索。特别是，需要将训练时期的数量调整为数据集大小。在本文中，我们介绍了自适应微调，这是一种替代方法，它使用早期停止和自定义学习率计划，以动态调整培训时期的数量到数据集大小。对于命名实体识别的示例用例，我们表明我们的方法不仅可以在培训时期的冗余数量上进行超参数搜索，还可以在性能，稳定性和效率方面提高结果。这是正确的，尤其是对于小型数据集而言，我们的表现要超过最先进的微调方法。

The current standard approach for fine-tuning transformer-based language models includes a fixed number of training epochs and a linear learning rate schedule. In order to obtain a near-optimal model for the given downstream task, a search in optimization hyperparameter space is usually required. In particular, the number of training epochs needs to be adjusted to the dataset size. In this paper, we introduce adaptive fine-tuning, which is an alternative approach that uses early stopping and a custom learning rate schedule to dynamically adjust the number of training epochs to the dataset size. For the example use case of named entity recognition, we show that our approach not only makes hyperparameter search with respect to the number of training epochs redundant, but also leads to improved results in terms of performance, stability and efficiency. This holds true especially for small datasets, where we outperform the state-of-the-art fine-tuning method by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题