论文标题
选择用于回归不连续设计的因果推断的亚群
Selecting Subpopulations for Causal Inference in Regression Discontinuity Designs
论文作者
论文摘要
巴西Bolsa Familia(BF)计划是一项有条件的现金转移计划,旨在通过直接现金转移来减少短期贫困,并通过增加贫穷巴西人民的人力资本来抵抗长期贫困。 Bolsa Familia益处的资格取决于截止规则,该规则将BF研究归类为回归不连续性(RD)设计。从RD研究中提取因果信息是具有挑战性的。继Li等人(2015年)和Branson and Fealli(2019)之后,我们正式将BF RD设计描述为潜在结果方法中的局部随机实验。在此框架下,可以在亚种群中识别和估计因果效应,在该子人群中,局部重叠假设,局部SUTVA和局部无知性假设的存在。我们首先讨论了基于连续性假设的局部回归方法的潜在优势,这些方法涉及因果估计的定义,研究的设计和分析以及结果的解释和普遍性。这种局部随机化方法的一个关键问题是如何选择我们可以绘制有效因果推理的亚群。我们提出了一种基于贝叶斯模型的有限混合方法来聚类,以将观测值分类为RD假设所持和不存在的亚种群。这种方法具有重要的优势:a)允许考虑亚种群成员资格的不确定性,这通常被忽略; b)它不会对亚群的形状施加任何限制; c)它可扩展到高维设置; e)与平均治疗效果(ATE)相比,它允许靶向替代因果估计; f)在一定程度上操纵/选择运行变量是可靠的。我们采用建议的方法来评估2009年Bolsa Familia计划对麻风发病率的因果关系。
The Brazil Bolsa Familia (BF) program is a conditional cash transfer program aimed to reduce short-term poverty by direct cash transfers and to fight long-term poverty by increasing human capital among poor Brazilian people. Eligibility for Bolsa Familia benefits depends on a cutoff rule, which classifies the BF study as a regression discontinuity (RD) design. Extracting causal information from RD studies is challenging. Following Li et al (2015) and Branson and Mealli (2019), we formally describe the BF RD design as a local randomized experiment within the potential outcome approach. Under this framework, causal effects can be identified and estimated on a subpopulation where a local overlap assumption, a local SUTVA and a local ignorability assumption hold. We first discuss the potential advantages of this framework over local regression methods based on continuity assumptions, which concern the definition of the causal estimands, the design and the analysis of the study, and the interpretation and generalizability of the results. A critical issue of this local randomization approach is how to choose subpopulations for which we can draw valid causal inference. We propose a Bayesian model-based finite mixture approach to clustering to classify observations into subpopulations where the RD assumptions hold and do not hold. This approach has important advantages: a) it allows to account for the uncertainty in the subpopulation membership, which is typically neglected; b) it does not impose any constraint on the shape of the subpopulation; c) it is scalable to high-dimensional settings; e) it allows to target alternative causal estimands than the average treatment effect (ATE); and f) it is robust to a certain degree of manipulation/selection of the running variable. We apply our proposed approach to assess causal effects of the Bolsa Familia program on leprosy incidence in 2009.