学习Countsketch中的位置

论文标题

学习Countsketch中的位置

Learning the Positions in CountSketch

论文作者

Liu, Simin, Liu, Tianrui, Vakilian, Ali, Wan, Yulin, Woodruff, David P.

论文摘要

我们考虑素描算法，该算法首先通过使用随机草图矩阵乘法来快速压缩数据，然后应用草图快速解决优化问题，例如低等级近似。在Indyk等人提出的基于学习的素描范式中。 [2019]，通过选择一个随机稀疏矩阵（例如，Countsketch），然后通过在训练数据集中运行梯度下降来更新非零条目的值，从而找到草图矩阵。尽管在此范式上的工作越来越大，但明显的遗漏是，固定了先前算法的非零条目的位置，并且只学到了它们的值。在这项工作中，我们提出了第一种学习算法，该算法也优化了非零条目的位置。我们显示，该算法比以前的工作给出了低级近似的准确性，并将其应用于其他问题，例如首次聚集。我们表明，在尖峰协方差模型和Zipfian矩阵中，我们的算法要好。我们还展示了素描单调属性对于结合学习素描的重要性。我们的经验结果表明，不仅优化非零条目的值，而且还要优化其位置的重要性。

We consider sketching algorithms which first quickly compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem, e.g., low rank approximation. In the learning-based sketching paradigm proposed by Indyk et al. [2019], the sketch matrix is found by choosing a random sparse matrix, e.g., the CountSketch, and then updating the values of the non-zero entries by running gradient descent on a training data set. Despite the growing body of work on this paradigm, a noticeable omission is that the locations of the non-zero entries of previous algorithms were fixed, and only their values were learned. In this work we propose the first learning algorithm that also optimizes the locations of the non-zero entries. We show this algorithm gives better accuracy for low rank approximation than previous work, and apply it to other problems such as $k$-means clustering for the first time. We show that our algorithm is provably better in the spiked covariance model and for Zipfian matrices. We also show the importance of the sketch monotonicity property for combining learned sketches. Our empirical results show the importance of optimizing not only the values of the non-zero entries but also their positions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题