论文标题

尖峰协方差矩阵的主要成分的统计推断

Statistical inference for principal components of spiked covariance matrices

论文作者

Bao, Zhigang, Ding, Xiucai, Wang, Jingming, Wang, Ke

论文摘要

在本文中,我们研究了高尺寸尖刺样品协方差矩阵的极端特征值和特征向量的渐近行为,这是可以可靠地检测到尖峰的超临界情况。尤其是,我们推导了相关特征向量的极端特征值和广义成分的联合分布,即,假设尺寸和样本尺寸相当大,则特征向量向后的投影在任意给定方向上。通常,关节分布是根据有限的许多高斯和卡方变量的线性组合给出的,其参数取决于投影方向和尖峰。我们对峰值的假设是完全一般的。首先,只有尖峰的强度才能略高于临界阈值,并且不需要上限。其次,允许多个尖峰,即具有相同强度的尖峰。第三,没有在尖峰上施加结构假设。借助一般环境,我们可以将结果应用于涉及特征值和特征向量的各种高维统计假设检验问题。具体而言,我们提出了准确而有力的统计数据,以对主要组成部分进行假设检验。这些统计数据依赖于数据,并且适应了基本的真实峰值。数值模拟还证实了我们提出的统计数据的准确性和功能,与文献中现有方法相比,性能明显更好。特别是,即使尖峰小或尺寸较大,我们的方法也是准确而强大的。

In this paper, we study the asymptotic behavior of the extreme eigenvalues and eigenvectors of the high dimensional spiked sample covariance matrices, in the supercritical case when a reliable detection of spikes is possible. Especially, we derive the joint distribution of the extreme eigenvalues and the generalized components of the associated eigenvectors, i.e., the projections of the eigenvectors onto arbitrary given direction, assuming that the dimension and sample size are comparably large. In general, the joint distribution is given in terms of linear combinations of finitely many Gaussian and Chi-square variables, with parameters depending on the projection direction and the spikes. Our assumption on the spikes is fully general. First, the strengths of spikes are only required to be slightly above the critical threshold and no upper bound on the strengths is needed. Second, multiple spikes, i.e., spikes with the same strength, are allowed. Third, no structural assumption is imposed on the spikes. Thanks to the general setting, we can then apply the results to various high dimensional statistical hypothesis testing problems involving both the eigenvalues and eigenvectors. Specifically, we propose accurate and powerful statistics to conduct hypothesis testing on the principal components. These statistics are data-dependent and adaptive to the underlying true spikes. Numerical simulations also confirm the accuracy and powerfulness of our proposed statistics and illustrate significantly better performance compared to the existing methods in the literature. Especially, our methods are accurate and powerful even when either the spikes are small or the dimension is large.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源