论文标题
调整与文本的混杂因素:挑战和因果推论的经验评估框架
Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference
论文作者
论文摘要
使用文本社交媒体数据的因果推理研究可以为人类行为提供可行的见解。通过文本进行准确的因果推论需要控制混淆,否则可能会带来偏见。最近,已经提出了许多用于调整混杂因素的方法,我们表明这些现有方法在以前的社交媒体研究启发的两个数据集上相互不同意。评估因果方法是具有挑战性的,因为地面真相反事实几乎永远无法使用。目前,存在使用文本的因果方法的经验评估框架,因此,从业者必须在没有指导的情况下选择其方法。我们为第一个这样的框架做出了贡献,该框架由现实世界研究绘制的五项任务组成。我们的框架可以使用文本对任何休闲推理方法进行评估。在648个实验和两个数据集中,我们评估了所有常用的因果推理方法,并确定其优势和缺点,以告知寻求使用此类方法的社交媒体研究人员,并指导未来的改进。我们将所有任务,数据和模型公开以通知应用程序并鼓励其他研究。
Causal inference studies using textual social media data can provide actionable insights on human behavior. Making accurate causal inferences with text requires controlling for confounding which could otherwise impart bias. Recently, many different methods for adjusting for confounders have been proposed, and we show that these existing methods disagree with one another on two datasets inspired by previous social media studies. Evaluating causal methods is challenging, as ground truth counterfactuals are almost never available. Presently, no empirical evaluation framework for causal methods using text exists, and as such, practitioners must select their methods without guidance. We contribute the first such framework, which consists of five tasks drawn from real world studies. Our framework enables the evaluation of any casual inference method using text. Across 648 experiments and two datasets, we evaluate every commonly used causal inference method and identify their strengths and weaknesses to inform social media researchers seeking to use such methods, and guide future improvements. We make all tasks, data, and models public to inform applications and encourage additional research.