论文标题
作者2VEC:生成用户嵌入的框架
Author2Vec: A Framework for Generating User Embedding
论文作者
论文摘要
在线论坛和社交媒体平台每天提供嘈杂但有价值的数据。在本文中,我们提出了一种新颖的端到端神经网络的用户嵌入系统作者2VEC。该模型结合了由BERT(Transformers的双向编码器表示)生成的句子表示形式,并具有一个新颖的无监督预训练的预培训目标,作者身份分类,以产生编码有用的用户内部属性的更好的用户嵌入。该用户嵌入系统已在10K REDDIT用户的POST数据中进行了预训练,并在两个用户分类基准:抑郁症检测和人格分类中进行了分析和评估,在该基准中,该模型被证明胜过基于传统的基于计数和基于预测的方法。我们证实了作者2VEC成功编码有用的用户属性,而生成的用户嵌入在下游分类任务中的表现很好,而无需进一步的填充。
Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.