论文标题

石灰:无种子的弱监督文本分类

LIME: Weakly-Supervised Text Classification Without Seeds

论文作者

Park, Seongmin, Lee, Jihwa

论文摘要

在弱监督的文本分类中,只有标签名称充当监督的来源。弱监督文本分类的主要方法利用了两阶段的框架,在该框架中首先分配了测试样本,然后使用伪标记,然后用于训练神经文本分类器。在以前的大多数工作中,伪标记的步骤取决于获得最能捕获每个类标签相关性的种子单词。我们提出了石灰,这是一个用于弱监督文本分类的框架,它完全取代了基于基于伪造的伪分类的脆性种子词生成过程。我们发现,结合弱监督的分类和文本需要减轻两者的缺点,从而导致了更简化和有效的分类管道。借助现成的文字构成模型,石灰在弱监督文本分类中的最新基线优于最近的基线,并在4个基准中实现了最先进的基线。我们通过https://github.com/seongminp/lime开源代码。

In weakly-supervised text classification, only label names act as sources of supervision. Predominant approaches to weakly-supervised text classification utilize a two-phase framework, where test samples are first assigned pseudo-labels and are then used to train a neural text classifier. In most previous work, the pseudo-labeling step is dependent on obtaining seed words that best capture the relevance of each class label. We present LIME, a framework for weakly-supervised text classification that entirely replaces the brittle seed-word generation process with entailment-based pseudo-classification. We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both, resulting in a more streamlined and effective classification pipeline. With just an off-the-shelf textual entailment model, LIME outperforms recent baselines in weakly-supervised text classification and achieves state-of-the-art in 4 benchmarks. We open source our code at https://github.com/seongminp/LIME.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源