论文标题
从数字足迹中预测政治意识形态
Predicting Political Ideology from Digital Footprints
论文作者
论文摘要
本文提出了一种新方法,以在世界上最大的在线讨论论坛之一上从数字足迹中预测个人政治意识形态。我们从在线讨论论坛Reddit编辑了一个独特的数据集,其中包含有关大约91,000名用户的政治意识形态以及其评论频率的记录以及评论的文本语料库的190,000多种不同的感兴趣的概念。应用一组统计学习方法,我们表明仅在非政治讨论论坛中有关活动的信息就可以非常准确地预测用户的政治意识形态。根据模型,我们能够以高达90.63%的准确性预测意识形态的经济维度,而社会维度则具有高达82.02%的社会维度。相比之下,使用实际注释中的文本功能并不能提高预测精度。我们的论文强调了使用在线数据分析人类的偏好和行为时,揭示数字行为对数字通信的偏好的重要性。
This paper proposes a new method to predict individual political ideology from digital footprints on one of the world's largest online discussion forum. We compiled a unique data set from the online discussion forum reddit that contains information on the political ideology of around 91,000 users as well as records of their comment frequency and the comments' text corpus in over 190,000 different subforums of interest. Applying a set of statistical learning approaches, we show that information about activity in non-political discussion forums alone, can very accurately predict a user's political ideology. Depending on the model, we are able to predict the economic dimension of ideology with an accuracy of up to 90.63% and the social dimension with and accuracy of up to 82.02%. In comparison, using the textual features from actual comments does not improve predictive accuracy. Our paper highlights the importance of revealed digital behaviour to complement stated preferences from digital communication when analysing human preferences and behaviour using online data.