论文标题

交替的最小化对混合线性回归的超线性收敛

Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

论文作者

Ghosh, Avishek, Ramchandran, Kannan

论文摘要

我们解决了解决混合随机线性方程的问题。我们的观察值来自多个线性回归,每个观察结果完全对应于回归模型之一。目的是从观察结果中学习线性回归器。通常,交替的最小化(AM)(这是期望最大化的变体(EM))来解决此问题。在标签的估计和估计标签解决回归问题之间进行迭代交替。从经验上讲,观察到,对于包括混合线性回归在内的各种非凸问题,与基于梯度的算法相比,AM收敛的速度要快得多。但是,现有的理论提出了基于AM和基于梯度的方法的收敛速率相似,因此未能捕获这种经验行为。在本文中,我们在理论和实践之间弥补了$ 2 $线性回归的特殊情况。我们表明,在某些参数制度中,AM提供了适当的初始化,享受\ emph {super-linear}收敛速率。据我们所知,这是理论上为AM建立这种速度的第一项工作。因此,如果我们要恢复$ε$的错误(在$ \ ell_2 $ norm中)的未知回归器,则AM只需$ \ MATHCAL {O}(\ log log \ log \ log \ log \ log \ log \ log \ log \ log(1/ε))$迭代。此外,我们将AM与基于梯度的启发式算法进行了比较,并表明AM在迭代复杂性和墙壁锁定时间中占主导地位。

We address the problem of solving mixed random linear equations. We have unlabeled observations coming from multiple linear regressions, and each observation corresponds to exactly one of the regression models. The goal is to learn the linear regressors from the observations. Classically, Alternating Minimization (AM) (which is a variant of Expectation Maximization (EM)) is used to solve this problem. AM iteratively alternates between the estimation of labels and solving the regression problems with the estimated labels. Empirically, it is observed that, for a large variety of non-convex problems including mixed linear regression, AM converges at a much faster rate compared to gradient based algorithms. However, the existing theory suggests similar rate of convergence for AM and gradient based methods, failing to capture this empirical behavior. In this paper, we close this gap between theory and practice for the special case of a mixture of $2$ linear regressions. We show that, provided initialized properly, AM enjoys a \emph{super-linear} rate of convergence in certain parameter regimes. To the best of our knowledge, this is the first work that theoretically establishes such rate for AM. Hence, if we want to recover the unknown regressors upto an error (in $\ell_2$ norm) of $ε$, AM only takes $\mathcal{O}(\log \log (1/ε))$ iterations. Furthermore, we compare AM with a gradient based heuristic algorithm empirically and show that AM dominates in iteration complexity as well as wall-clock time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源