论文标题
半监督聚类和对比度学习,以发现新意图
Semi-Supervised Clustering with Contrastive Learning for Discovering New Intents
论文作者
论文摘要
现实世界中的大多数对话系统都依赖于质量保证服务的预定义意图和答案,因此以前发现大型语料库的潜在意图对于建立此类对话服务非常重要。考虑到大多数方案已经很少已经知道,并且大多数正在等待发现的目的,因此我们专注于半监视的文本聚类,并尝试从标记的样本中获得提出的方法,从而获得更好的整体聚类性能。在本文中,我们提出了深度对比的半监督聚类(DCSC),该群集旨在以半监督的方式将文本样本群群群集,并为操作人员提供分组意图。为了使DCSC充分利用有限的已知意图,我们提出了针对DCSC的两阶段培训程序,其中DCSC将在标记的样品和未标记的样本上接受培训,并实现更好的文本表示和聚类性能。我们在两个公共数据集上进行了实验,以将模型与几种流行方法进行比较,结果表明DCSC在所有数据集和情况下都达到了最佳性能,这表明我们工作的改进效果。
Most dialogue systems in real world rely on predefined intents and answers for QA service, so discovering potential intents from large corpus previously is really important for building such dialogue services. Considering that most scenarios have few intents known already and most intents waiting to be discovered, we focus on semi-supervised text clustering and try to make the proposed method benefit from labeled samples for better overall clustering performance. In this paper, we propose Deep Contrastive Semi-supervised Clustering (DCSC), which aims to cluster text samples in a semi-supervised way and provide grouped intents to operation staff. To make DCSC fully utilize the limited known intents, we propose a two-stage training procedure for DCSC, in which DCSC will be trained on both labeled samples and unlabeled samples, and achieve better text representation and clustering performance. We conduct experiments on two public datasets to compare our model with several popular methods, and the results show DCSC achieve best performance across all datasets and circumstances, indicating the effect of the improvements in our work.