论文标题
张量PCA中的统计计算折衷以及通过沟通复杂性的相关问题
Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity
论文作者
论文摘要
Tensor PCA是Montanari和Richard引入的程式化的统计推断问题,用于研究估计高阶矩张量的未知参数的计算难度。与矩阵对应物不同,Tensor PCA表现出统计计算差距,即,在理论上可以解决该问题的样本大小制度,但可以猜想在计算上很难。本文使用通信复杂性来推导张量PCA的内存有限算法的运行时间计算下限。这些下限指定了通过数据样本的通过数,样本量以及成功解决张量PCA所需的任何算法所需的内存。尽管下限不排除多项式时间算法,但它们确实暗示着许多常用算法(例如梯度下降和功率方法),当样本量不够大时,必须具有较高的迭代计数。对于非高斯组件分析,获得了类似的下限,这是一个统计估计问题的家族,其中低阶张力张量没有任何有关未知参数的信息。最后,对于张量PCA的不对称变体和相关的统计估计问题,获得了更强的下限。这些结果解释了为什么这些问题的许多估计器都使用的记忆状态明显大于感兴趣参数的有效维度。
Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.