论文标题

主题模型4J:主题模型的Java软件包

TopicModel4J: A Java Package for Topic Models

论文作者

Qian, Yang, Jiang, Yuanchun, Chai, Yidong, Liu, Yezheng, Sun, Jiansha

论文摘要

主题模型提供了一个灵活的原则框架,用于探索高维同时出现数据中的隐藏结构,并且是文本的常用自然语言处理(NLP)。在本文中,我们设计和实施了Java软件包,即topormodel4j,其中包含13种用于拟合主题模型的代表性算法。 Java编程环境中的主题Model4j为数据分析师提供了一个易于使用的接口,以运行算法,并允许轻松输入和输出数据。此外,此软件包还提供了一些非结构化的文本预处理技术,例如将文本数据分为单词,降低单词,预先形成lemmatization并删除无用的字符,URL和停止单词。

Topic models provide a flexible and principled framework for exploring hidden structure in high-dimensional co-occurrence data and are commonly used natural language processing (NLP) of text. In this paper, we design and implement a Java package, TopicModel4J, which contains 13 kinds of representative algorithms for fitting topic models. The TopicModel4J in the Java programming environment provides an easy-to-use interface for data analysts to run the algorithms, and allow to easily input and output data. In addition, this package provides a few unstructured text preprocessing techniques, such as splitting textual data into words, lowercasing the words, preforming lemmatization and removing the useless characters, URLs and stop words.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源