通过元重量调节器适应性匹配的文本匹配

论文标题

通过元重量调节器适应性匹配的文本匹配

Adaptable Text Matching via Meta-Weight Regulator

论文作者

Zhang, Bo, Zhang, Chen, Ma, Fang, Song, Dawei

论文摘要

神经文本匹配模型已用于一系列应用程序，例如问答和自然语言推断，并产生了良好的性能。但是，这些神经模型的适应性有限，导致遇到来自不同数据集甚至不同任务的测试示例的性能下降。适应性在几次设置中尤其重要：在许多情况下，目标数据集或任务只有有限的标记数据，而我们可能可以访问标记为富的源数据集或任务。但是，将在大量源数据训练的模型中调整为几个弹出目标数据集或任务是具有挑战性的。为了应对这一挑战，我们提出了一个元重量调节器（MWR），这是一种元学习方法，它根据其与目标损失的相关性，学会了将权重分配给源示例。具体而言，MWR首先在均匀加权的源示例上训练模型，并通过损耗函数测量模型在目标示例中的功效。通过迭代执行（元）梯度下降，高阶梯度传播到源示例。然后，这些梯度以与目标性能相关的方式来更新源示例的权重。由于MWR是模型不可静止的，因此可以应用于任何主链神经模型。在四个广泛使用的数据集和两个任务上，使用各种主链文本匹配模型进行了广泛的实验。结果表明，我们所提出的方法显着胜过许多现有的适应方法，并有效地改善了在几个射击设置中神经文本匹配模型的交叉数据和交叉任务适应性。

Neural text matching models have been used in a range of applications such as question answering and natural language inference, and have yielded a good performance. However, these neural models are of a limited adaptability, resulting in a decline in performance when encountering test examples from a different dataset or even a different task. The adaptability is particularly important in the few-shot setting: in many cases, there is only a limited amount of labeled data available for a target dataset or task, while we may have access to a richly labeled source dataset or task. However, adapting a model trained on the abundant source data to a few-shot target dataset or task is challenging. To tackle this challenge, we propose a Meta-Weight Regulator (MWR), which is a meta-learning approach that learns to assign weights to the source examples based on their relevance to the target loss. Specifically, MWR first trains the model on the uniformly weighted source examples, and measures the efficacy of the model on the target examples via a loss function. By iteratively performing a (meta) gradient descent, high-order gradients are propagated to the source examples. These gradients are then used to update the weights of source examples, in a way that is relevant to the target performance. As MWR is model-agnostic, it can be applied to any backbone neural model. Extensive experiments are conducted with various backbone text matching models, on four widely used datasets and two tasks. The results demonstrate that our proposed approach significantly outperforms a number of existing adaptation methods and effectively improves the cross-dataset and cross-task adaptability of the neural text matching models in the few-shot setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题