论文标题
半监督神经主题建模的联合学习方法
A Joint Learning Approach for Semi-supervised Neural Topic Modeling
论文作者
论文摘要
主题模型是以可解释方式表示文本数据的一些最流行的方法。最近,深层生成模型的进步,特别是自动编码的变分贝叶斯(AEVB),导致引入了无监督的神经主题模型,该模型利用了与传统的基于统计的主题模型相反的深层生成模型。我们通过引入标签指数神经主题模型(LI-NTM)来扩展这些神经主题模型,这是我们知识范围的第一个有效上游半监督神经主题模型。我们发现,Li-NTM在文档重建基准中的表现优于现有的神经主题模型,其结果最为明显,标记为低标记的数据集和具有信息标签的数据集;此外,我们所学到的分类器在消融研究中优于基线分类器。
Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.