假设可变显着性测试的预计协方差措施

论文标题

假设可变显着性测试的预计协方差措施

The Projected Covariance Measure for assumption-lean variable significance testing

论文作者

Lundborg, Anton Rask, Kim, Ilmun, Shah, Rajen D., Samworth, Richard J.

论文摘要

在统计数据中，测试变量或一组变量$ x $在预测响应$ y $中的重要性，这是一项无处不在的任务。一种简单但常见的方法是指定线性模型，然后测试$ x $的回归系数是否非零。但是，当模型被弄清楚时，测试可能具有较差的功率，例如，当$ x $参与复杂的交互或导致许多错误拒绝时。在这项工作中，我们研究了测试有条件平均独立性的无模型零的问题，即给定$ x $和$ z $的$ y $的条件平均值不取决于$ x $。我们提出了一个简单而通用的框架，该框架可以利用灵活的非参数或机器学习方法，例如加性模型或随机森林，以产生强大的误差控制和高功率。该过程涉及使用这些方法执行回归，首先要使用一半的数据估算$ x $和$ z $的$ y $的预测形式，然后估算此预测的预期条件协方差和剩余数据的$ y $之间的预期条件协方差。尽管该方法是一般的，但我们表明，使用样条回归的过程的一个版本可以实现我们显示的是这个非参数测试问题中的最小值最佳速率。与几种现有方法相比，数值实验在维持I型误差控制和功率方面都证明了我们方法的有效性。

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题