论文标题
夫人:匹配实体的域改编
DAME: Domain Adaptation for Matching Entities
论文作者
论文摘要
实体匹配(EM)标识了指涉及相同现实世界实体的数据记录。尽管过去几年努力提高EM的性能,但在训练阶段,现有方法仍需要大量标记数据。这些方法分别处理每个域,并捕获EM中每个数据集的特定信号,这仅在一个数据集上过度拟合。从一个数据集中学到的知识并未用于更好地理解EM任务,以便在较少标记的样本的看不见的数据集上进行预测。在本文中,我们提出了一种基于域的新方法,该方法将任务知识从多个源域转移到目标域。我们的方法为EM提供了一个新设置,其中的目标是使用多个源域从模型中捕获特定于任务的知识,然后在目标域上测试我们的模型。我们研究了目标域上的零射门学习案例,并证明我们的方法学习了EM任务并将知识转移到目标域。我们从多个域中广泛研究了在目标数据集上的微调模型,并证明我们的模型比EM中的最新方法更好地推广了概括。
Entity matching (EM) identifies data records that refer to the same real-world entity. Despite the effort in the past years to improve the performance in EM, the existing methods still require a huge amount of labeled data in each domain during the training phase. These methods treat each domain individually, and capture the specific signals for each dataset in EM, and this leads to overfitting on just one dataset. The knowledge that is learned from one dataset is not utilized to better understand the EM task in order to make predictions on the unseen datasets with fewer labeled samples. In this paper, we propose a new domain adaptation-based method that transfers the task knowledge from multiple source domains to a target domain. Our method presents a new setting for EM where the objective is to capture the task-specific knowledge from pretraining our model using multiple source domains, then testing our model on a target domain. We study the zero-shot learning case on the target domain, and demonstrate that our method learns the EM task and transfers knowledge to the target domain. We extensively study fine-tuning our model on the target dataset from multiple domains, and demonstrate that our model generalizes better than state-of-the-art methods in EM.