论文标题
通过半合规滋扰模型增强转移回归学习
Augmented Transfer Regression Learning with Semi-non-parametric Nuisance Models
论文作者
论文摘要
在当代统计学习中,当测试数据的分布从培训数据中转移时,协变量偏移校正在转移学习中起着重要作用。重要的加权是调整协变量转移的一种自然和原则策略,通常在转移学习领域中使用。但是,此策略对于模拟错误指定或过度估计误差并不强大。在本文中,我们提出了一种增强的转移回归学习(ATREL)方法,该方法介绍了针对目标响应的插补模型,并将其用于增强重要性加权方程。借助新型的半裸参数构建体和两个滋扰模型的校准力矩估计方程,我们的Atrel方法与非参数方法相比,对(i)维度的诅咒不太容易发生,并且(ii)模型错误指定的指定性而不是参数方法。我们表明,当正确指定至少一个滋扰模型时,我们的ATREL估计量是root-n的一致性,对滋扰模型的参数部分的估计达到了参数率,而非参数组件的速率是双重稳健的。仿真研究表明,我们的方法比各种配置下的现有参数和完全非参数(机器学习)估计器更强大,更有效。我们还通过一个真实的示例来研究方法的实用性,该示例关于在不同时间窗口中转移类风湿关节炎的表型学习算法。最后,我们提出了提高估计器内在效率的方法,并将现代机器学习方法与我们建议的框架结合在一起。
In contemporary statistical learning, covariate shift correction plays an important role in transfer learning when distribution of the testing data is shifted from the training data. Importance weighting, as a natural and principle strategy to adjust for covariate shift, has been commonly used in the field of transfer learning. However, this strategy is not robust to model misspecification or excessive estimation error. In this paper, we propose an augmented transfer regression learning (ATReL) approach that introduces an imputation model for the targeted response, and uses it to augment the importance weighting equation. With novel semi-non-parametric constructions and calibrated moment estimating equations for the two nuisance models, our ATReL method is less prone to (i) the curse of dimensionality compared to nonparametric approaches, and (ii) model mis-specification than parametric approaches. We show that our ATReL estimator is root-n-consistent when at least one nuisance model is correctly specified, estimation for the parametric part of the nuisance models achieves parametric rate, and the nonparametric components are rate doubly robust. Simulation studies demonstrate that our method is more robust and efficient than existing parametric and fully nonparametric (machine learning) estimators under various configurations. We also examine the utility of our method through a real example about transfer learning of phenotyping algorithm for rheumatoid arthritis across different time windows. Finally, we propose ways to enhance the intrinsic efficiency of our estimator and to incorporate modern machine learning methods with our proposed framework.