论文标题
后验预测倾向分数和$ p $值
Posterior Predictive Propensity Scores and $p$-Values
论文作者
论文摘要
\ citet {Rosenbaum83ps}引入了倾向评分的概念,并讨论了其在与观察性研究的因果推论中的核心作用。然而,他们的论文与\ citet {rubin78}的早期论文引起了根本的不一致,这表明,如果对倾向分数和结果模型对贝叶斯分析的贝叶斯分析,则倾向得分在贝叶斯分析中没有任何作用。尽管文献做出了严重的努力,但通常很难调和这些矛盾的结果。我们提供了一种简单的方法,可以根据后预测$ p $ value将倾向得分纳入贝叶斯因果推断中。为了激发一个简单的程序,我们将重点放在模型上,以强烈的零假设对任何单位没有任何因果效应。在计算上,提出的后验预测$ p $ - 值等于经典的$ p $ - 基于费舍尔随机测试的平均倾向分数的后验预测分布。此外,使用学生化的双重稳健估计器作为测试统计量,所提出的$ p $ value继承了双重稳健的属性,并且在测试零平均因果效应的弱零假设方面也无效。也许令人惊讶的是,这种贝叶斯动机的$ p $ - 价值比频繁的$ p $ p $ p $ value具有更好的经常样本样本性能,基于渐近近似,尤其是当倾向得分可以达到极高的值时。
\citet{Rosenbaum83ps} introduced the notion of the propensity score and discussed its central role in causal inference with observational studies. Their paper, however, caused a fundamental incoherence with an early paper by \citet{Rubin78}, which showed that the propensity score does not play any role in the Bayesian analysis of unconfounded observational studies if the priors on the propensity score and outcome models are independent. Despite the serious efforts made in the literature, it is generally difficult to reconcile these contradicting results. We offer a simple approach to incorporating the propensity score in Bayesian causal inference based on the posterior predictive $p$-value. To motivate a simple procedure, we focus on the model with the strong null hypothesis of no causal effects for any units whatsoever. Computationally, the proposed posterior predictive $p$-value equals the classic $p$-value based on the Fisher randomization test averaged over the posterior predictive distribution of the propensity score. Moreover, using the studentized doubly robust estimator as the test statistic, the proposed $p$-value inherits the doubly robust property and is also asymptotically valid for testing the weak null hypothesis of zero average causal effect. Perhaps surprisingly, this Bayesianly motivated $p$-value can have better frequentist's finite-sample performance than the frequentist's $p$-value based on the asymptotic approximation especially when the propensity scores can take extreme values.