论文标题

测试具有高维随机森林的回归异性恋性

Testing for Regression Heteroskedasticity with High-Dimensional Random Forests

论文作者

Chien-Ming, Chi

论文摘要

高维回归异方差性的统计推断是一个重要但探索的问题。当前的论文旨在通过提出两个测试来填补这一空白,即方差差异测试和方差差异Breusch-Pagan检验,以评估高维回归异性恋性。前者测试感兴趣的解释性特征是否与响应变量的条件差异有关,而后者测试回归中的异性恋性,这是Breusch-Pagan测试问题。为了正式建立测试,我们衍生了严格的P值和测试尺寸,并在非参数异质数据生成具有高维输入特征的非参数异质数据生成模型下分析了测试能力。这种模型设置考虑了具有异性弹性性的灵活结构的高维应用,并且对响应平均值具有相互作用效应。这些是许多领域(例如生物学)的常见应用。我们的方法利用机器学习平均预测方法,例如随机森林,并将仿冒变量用作负面对照。特别是,我们的测试统计数据的仿冒品定义比仿冒品的原始定义更灵活,我们对这两个定义进行了详细的比较,并讨论了我们的仿冒品的优势。通过模拟结果和HIV(人类免疫缺陷病毒)案例研究,说明了所提出的测试的令人满意的经验表现。

Statistical inference for high-dimensional regression heteroskedasticity is an important but under-explored problem. The current paper aims at filling this gap by proposing two tests, namely the variance difference test and the variance difference Breusch-Pagan test, for assessing high-dimensional regression heteroskedasticity. The former tests whether an explanatory feature of interest is associated with the conditional variance of a response variable, while the latter tests heteroskedasticity in the regression, which is known to be the Breusch-Pagan test problem. To formally establish the tests, we have derived rigorous P-values and test sizes, and analyzed the test power under a nonparametric heteroskedastic data generating model with high-dimensional input features. Such a model setting takes into account high-dimensional applications with flexible structures of heteroskedasticity and features having interaction effects on the mean of the response; these are common applications in many fields such as biology. Our methods leverage machine learning mean prediction methods such as random forests and use knockoff variables as negative controls. Particularly, the definition of knockoffs for our test statistics is more flexible than the original definition of knockoffs, and we give a detailed comparison of these two definitions and discuss the advantages of our knockoffs. The satisfactory empirical performance of the proposed tests is illustrated with simulation results and an HIV (Human Immunodeficiency Virus) case study.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源