论文标题
灵活的机器学习估计有条件平均治疗效果:祝福和诅咒
Flexible machine learning estimation of conditional average treatment effects: a blessing and a curse
论文作者
论文摘要
来自观察数据的因果推断需要无法测试的识别假设。如果适用这些假设,则可以使用机器学习(ML)方法来研究因果效应异质性的复杂形式。最近,开发了几种ML方法来估计条件平均治疗效果(CATE)。如果手头的特征无法解释所有异质性,则单个治疗效果(ITE)可能会严重偏离CATE。在这项工作中,我们演示了当应用因果随机森林(CRF)时,ITE和CATE的分布如何有所不同。我们扩展CRF以估计处理和对照之间条件差异的差异。如果ITE分布等于CATE分布,则估计的方差差异应该很小。如果它们有所不同,则需要一个额外的因果假设来量化CATE分布未捕获的异质性。当鉴于测得的特征,当单个效应独立于未治疗的结果时,可以确定ITE的条件差异。然后,在ITE和CATE分布不同的情况下,扩展的CRF可以适当地估计ITE分布的方差,而CRF无法做到这一点。
Causal inference from observational data requires untestable identification assumptions. If these assumptions apply, machine learning (ML) methods can be used to study complex forms of causal effect heterogeneity. Recently, several ML methods were developed to estimate the conditional average treatment effect (CATE). If the features at hand cannot explain all heterogeneity, the individual treatment effects (ITEs) can seriously deviate from the CATE. In this work, we demonstrate how the distributions of the ITE and the CATE can differ when a causal random forest (CRF) is applied. We extend the CRF to estimate the difference in conditional variance between treated and controls. If the ITE distribution equals the CATE distribution, this estimated difference in variance should be small. If they differ, an additional causal assumption is necessary to quantify the heterogeneity not captured by the CATE distribution. The conditional variance of the ITE can be identified when the individual effect is independent of the outcome under no treatment given the measured features. Then, in the cases where the ITE and CATE distributions differ, the extended CRF can appropriately estimate the variance of the ITE distribution while the CRF fails to do so.