半监督神经主题建模的联合学习方法

论文标题

半监督神经主题建模的联合学习方法

A Joint Learning Approach for Semi-supervised Neural Topic Modeling

论文作者

Chiu, Jeffrey, Mittal, Rajat, Tumma, Neehal, Sharma, Abhishek, Doshi-Velez, Finale

论文摘要

主题模型是以可解释方式表示文本数据的一些最流行的方法。最近，深层生成模型的进步，特别是自动编码的变分贝叶斯（AEVB），导致引入了无监督的神经主题模型，该模型利用了与传统的基于统计的主题模型相反的深层生成模型。我们通过引入标签指数神经主题模型（LI-NTM）来扩展这些神经主题模型，这是我们知识范围的第一个有效上游半监督神经主题模型。我们发现，Li-NTM在文档重建基准中的表现优于现有的神经主题模型，其结果最为明显，标记为低标记的数据集和具有信息标签的数据集；此外，我们所学到的分类器在消融研究中优于基线分类器。

Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题