论文标题
随机数值线性代数中的确定点过程
Determinantal Point Processes in Randomized Numerical Linear Algebra
论文作者
论文摘要
Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop improved algorithms for matrix problems that arise in scientific computing, data science, machine learning, etc. Determinantal Point Processes (DPPs), a seemingly unrelated topic in pure and applied mathematics, is a class of stochastic point processes with probability distribution characterized by sub-determinants of a kernel matrix.最近的工作发现了DPP和Randnla之间的深厚而富有成果的联系,这导致了这两个领域感兴趣的新保证和改进的算法。我们概述了这一令人兴奋的新研究系列,包括对Randnla和DPP的简要介绍,以及DPPS在经典线性代数任务(例如最小二乘回归,低排列近似值和NyStröm方法)中的应用。例如,使用DPP进行随机抽样会导致最小二乘的新型无偏估计量,从而使对这些算法的统计和推理理解更加完善。从某种意义上说,DPP是NyStröm方法的最佳随机算法。 Randnla技术称为杠杆得分采样,可以得出作为DPP的边际分布。我们还讨论了最近的算法开发,说明,尽管不如标准Randnla技术高效,但基于DPP的算法的效率只是中等昂贵的算法。
Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop improved algorithms for matrix problems that arise in scientific computing, data science, machine learning, etc. Determinantal Point Processes (DPPs), a seemingly unrelated topic in pure and applied mathematics, is a class of stochastic point processes with probability distribution characterized by sub-determinants of a kernel matrix. Recent work has uncovered deep and fruitful connections between DPPs and RandNLA which lead to new guarantees and improved algorithms that are of interest to both areas. We provide an overview of this exciting new line of research, including brief introductions to RandNLA and DPPs, as well as applications of DPPs to classical linear algebra tasks such as least squares regression, low-rank approximation and the Nyström method. For example, random sampling with a DPP leads to new kinds of unbiased estimators for least squares, enabling more refined statistical and inferential understanding of these algorithms; a DPP is, in some sense, an optimal randomized algorithm for the Nyström method; and a RandNLA technique called leverage score sampling can be derived as the marginal distribution of a DPP. We also discuss recent algorithmic developments, illustrating that, while not quite as efficient as standard RandNLA techniques, DPP-based algorithms are only moderately more expensive.