论文标题

与平行的dirichlet分配模型和肘法的对话对话语料库中的主题检测

Topic Detection from Conversational Dialogue Corpus with Parallel Dirichlet Allocation Model and Elbow Method

论文作者

Khalid, Haider, Wade, Vincent

论文摘要

会话系统需要知道如何在主题之间切换以在更长的时间内继续对话。对于此主题,对话库的检测已成为对话的重要任务,准确预测对话主题对于建立连贯和引人入胜的对话系统很重要。在本文中,我们提出了一种主题检测方法,并通过平行潜在的Dirichlet分配(PLDA)模型来聚集基于TF-IDF分数和单词(BOW)技术的已知类似单词的词汇。在实验中,我们使用肘方法使用K-均值聚类来解释和验证集群内分析的一致性,以选择最佳的簇数。我们通过将方法与传统LDA和聚类技术进行比较来评估我们的方法。实验结果表明,将PLDA与肘方法相结合可以选择最佳的簇数并完善对话的主题。

A conversational system needs to know how to switch between topics to continue the conversation for a more extended period. For this topic detection from dialogue corpus has become an important task for a conversation and accurate prediction of conversation topics is important for creating coherent and engaging dialogue systems. In this paper, we proposed a topic detection approach with Parallel Latent Dirichlet Allocation (PLDA) Model by clustering a vocabulary of known similar words based on TF-IDF scores and Bag of Words (BOW) technique. In the experiment, we use K-mean clustering with Elbow Method for interpretation and validation of consistency within-cluster analysis to select the optimal number of clusters. We evaluate our approach by comparing it with traditional LDA and clustering technique. The experimental results show that combining PLDA with Elbow method selects the optimal number of clusters and refine the topics for the conversation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源