论文标题
用响应条件最小二乘估计多指数模型
Estimating multi-index models with response-conditional least squares
论文作者
论文摘要
多指数模型是一个简单而强大的高维回归模型,假设$ \ mathbb {e} [y | x] = g(a^\ top x)$用于某些未知索引空间$ a $和链接函数$ g $。在本文中,我们介绍了一种估计索引空间的方法,并研究了链路函数回归中索引空间估计值的传播误差。所提出的方法通过在数据级别集合的线性回归坡度系数的跨度近似索引空间。基于普通的最小二乘,我们的方法易于实施,并且在计算上有效。我们证明了一个紧密的浓度结合,显示$ n^{ - 1/2} $ - 收敛,但也忠实地描述了对级别集合所选分区的依赖性,因此给出了超参数调整的指示。通过与最先进的方法(无论是在合成和真实数据集)的最新方法中,估算器的竞争力得到了证实。作为第二个贡献,我们为k-nearest邻居建立了最小的最佳概括界限,并在投影到任何$ n^{ - 1/2} $上的样品上进行培训时 - 索引空间的一致估计,从而提供了多个Index模型的完整且可证明的估计。
The multi-index model is a simple yet powerful high-dimensional regression model which circumvents the curse of dimensionality assuming $ \mathbb{E} [ Y | X ] = g(A^\top X) $ for some unknown index space $A$ and link function $g$. In this paper we introduce a method for the estimation of the index space, and study the propagation error of an index space estimate in the regression of the link function. The proposed method approximates the index space by the span of linear regression slope coefficients computed over level sets of the data. Being based on ordinary least squares, our approach is easy to implement and computationally efficient. We prove a tight concentration bound that shows $N^{-1/2}$-convergence, but also faithfully describes the dependence on the chosen partition of level sets, hence giving indications on the hyperparameter tuning. The estimator's competitiveness is confirmed by extensive comparisons with state-of-the-art methods, both on synthetic and real data sets. As a second contribution, we establish minimax optimal generalization bounds for k-nearest neighbors and piecewise polynomial regression when trained on samples projected onto any $N^{-1/2}$-consistent estimate of the index space, thus providing complete and provable estimation of the multi-index model.