论文标题
从头开始适应域
Domain Adaptation from Scratch
论文作者
论文摘要
自然语言处理(NLP)算法正在迅速改善,但在应用于分布的示例时通常会遇到困难。减轻域间隙的突出方法是域的适应性,其中在源域上训练的模型适应了新的目标域。我们提出了一种新的学习设置,即``从头开始适应域名'',我们认为这对于以隐私的方式将NLP覆盖到敏感领域至关重要。在此设置中,我们的目标是从一组源域中有效注释数据,以便训练有素的模型在无法从中无法提供的数据的敏感目标域上表现良好。我们的研究将这种具有挑战性的设置进行了比较,从数据选择和域适应算法到主动学习范式,在两个NLP任务上:情感分析和命名实体识别。我们的结果表明,使用上述方法可以缓解域间隙,并将它们组合进一步改善结果。
Natural language processing (NLP) algorithms are rapidly improving but often struggle when applied to out-of-distribution examples. A prominent approach to mitigate the domain gap is domain adaptation, where a model trained on a source domain is adapted to a new target domain. We present a new learning setup, ``domain adaptation from scratch'', which we believe to be crucial for extending the reach of NLP to sensitive domains in a privacy-preserving manner. In this setup, we aim to efficiently annotate data from a set of source domains such that the trained model performs well on a sensitive target domain from which data is unavailable for annotation. Our study compares several approaches for this challenging setup, ranging from data selection and domain adaptation algorithms to active learning paradigms, on two NLP tasks: sentiment analysis and Named Entity Recognition. Our results suggest that using the abovementioned approaches eases the domain gap, and combining them further improves the results.