论文标题

具有自适应概率标签群集的预验证的广义自回旋模型,用于极端多标签文本分类

Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification

论文作者

Ye, Hui, Chen, Zhiyu, Wang, Da-Han, Davison, Brian D.

论文摘要

极端的多标签文本分类(XMTC)是标记给定文本的任务,该文本具有最大的标签集中最相关的标签。我们提出了一种称为APLC-XLNET的新型深度学习方法。我们的方法微调最近发布的广义自动回报预验证的模型(XLNET),以学习输入文本的密集表示。我们提出了自适应概率标签簇(APLC),以通过利用不平衡的标签分布来形成明确减少计算时间的簇来近似横熵损失。我们在五个基准数据集上进行的实验表明,我们的方法已经在四个基准数据集上实现了新的最新结果。我们的源代码可在https://github.com/huiyegit/aplc_xlnet上公开获得。

Extreme multi-label text classification (XMTC) is a task for tagging a given text with the most relevant labels from an extremely large label set. We propose a novel deep learning method called APLC-XLNet. Our approach fine-tunes the recently released generalized autoregressive pretrained model (XLNet) to learn a dense representation for the input text. We propose Adaptive Probabilistic Label Clusters (APLC) to approximate the cross entropy loss by exploiting the unbalanced label distribution to form clusters that explicitly reduce the computational time. Our experiments, carried out on five benchmark datasets, show that our approach has achieved new state-of-the-art results on four benchmark datasets. Our source code is available publicly at https://github.com/huiyegit/APLC_XLNet.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源