论文标题
政治:对意识形态预测和立场检测的同层文章比较进行预处理
POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection
论文作者
论文摘要
意识形态是政治科学研究的核心。然而,仍然没有任何通用工具来表征和预测不同文本类型的意识形态。为此,我们使用新颖的意识形态驱动的预期目标研究了经过验证的语言模型,这些目标依赖于不同意识形态媒体所写的同一故事的文章比较。我们进一步收集了一个大规模数据集,其中包括超过360万的政治新闻文章,用于训练。我们的模型政治表现优于强大的基准和先前关于意识形态预测和立场检测任务的最先进模型。进一步的分析表明,政治尤其擅长理解长期或正式书面文本,并且在几次学习方案中也很强大。
Ideology is at the core of political science research. Yet, there still does not exist general-purpose tools to characterize and predict ideology across different genres of text. To this end, we study Pretrained Language Models using novel ideology-driven pretraining objectives that rely on the comparison of articles on the same story written by media of different ideologies. We further collect a large-scale dataset, consisting of more than 3.6M political news articles, for pretraining. Our model POLITICS outperforms strong baselines and the previous state-of-the-art models on ideology prediction and stance detection tasks. Further analyses show that POLITICS is especially good at understanding long or formally written texts, and is also robust in few-shot learning scenarios.