论文标题
到SoftMax,或者不适合SoftMax:这是为变压器模型应用主动学习时的问题
To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models
论文作者
论文摘要
尽管几乎所有自然语言处理应用程序都可以实现最新的最先进,但是基于变压器的微调语言模型仍然需要大量的标记数据才能工作。一项旨在减少获得标记数据集的人类努力量的众所周知的技术是\ textit {Active Learning}(Al):仅标记最小样本量的迭代过程。 AL策略需要访问模型预测的量化置信度度量。一个常见的选择是最终层的软磁激活函数。由于SoftMax功能提供了误导性概率,因此本文比较了七个数据集上的八个替代方案。我们几乎自相矛盾的发现是,大多数方法都擅长识别真正不确定的样本(离群值),因此标记仅会导致效果较差。作为一种启发式,我们建议系统地忽略样品,这与SoftMax函数相比会改善各种方法。
Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-based language models still requires a significant amount of labeled data to work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is \textit{Active Learning} (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final layer. As the softmax function provides misleading probabilities, this paper compares eight alternatives on seven datasets. Our almost paradoxical finding is that most of the methods are too good at identifying the true most uncertain samples (outliers), and that labeling therefore exclusively outliers results in worse performance. As a heuristic we propose to systematically ignore samples, which results in improvements of various methods compared to the softmax function.