提高了点产品内核的随机功能

论文标题

提高了点产品内核的随机功能

Improved Random Features for Dot Product Kernels

论文作者

Wacker, Jonas, Kanagawa, Motonobu, Filippone, Maurizio

论文摘要

点产品内核，例如多项式和指数（SoftMax）内核，是机器学习中最广泛使用的内核之一，因为它们可以对输入功能之间的相互作用进行建模，这在计算机视觉，自然语言处理等应用中至关重要。我们做出了一些新颖的贡献，以提高点产品内核的随机特征近似值的效率，以使这些内核在大规模学习中更有用。首先，我们使用复杂值的随机特征对多项式内核的现有随机特征近似值（例如Rademacher和Gaussian草图和Tensorsrht）进行了概括。我们从经验上表明，复杂特征的使用可以显着减少这些近似值的方差。其次，我们提供了理论分析，以通过得出其方差的封闭形式表达式来理解影响各种随机特征效率的因素。这些差异公式阐明了某些近似值（例如，张力）的方差要比其他方差较低（例如Rademacher草图），并且在这些条件下，使用复杂特征会导致方差较低的条件比真实特征较低。第三，通过使用这些方差公式（可以在实践中评估），我们开发了一种数据驱动的优化方法来改善一般点产品核的随机特征近似，这也适用于高斯内核。我们通过对各种任务和数据集进行了广泛的实验来描述这些贡献带来的改进。

Dot product kernels, such as polynomial and exponential (softmax) kernels, are among the most widely used kernels in machine learning, as they enable modeling the interactions between input features, which is crucial in applications like computer vision, natural language processing, and recommender systems. We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels, to make these kernels more useful in large scale learning. First, we present a generalization of existing random feature approximations for polynomial kernels, such as Rademacher and Gaussian sketches and TensorSRHT, using complex-valued random features. We show empirically that the use of complex features can significantly reduce the variances of these approximations. Second, we provide a theoretical analysis for understanding the factors affecting the efficiency of various random feature approximations, by deriving closed-form expressions for their variances. These variance formulas elucidate conditions under which certain approximations (e.g., TensorSRHT) achieve lower variances than others (e.g., Rademacher sketches), and conditions under which the use of complex features leads to lower variances than real features. Third, by using these variance formulas, which can be evaluated in practice, we develop a data-driven optimization approach to improve random feature approximations for general dot product kernels, which is also applicable to the Gaussian kernel. We describe the improvements brought by these contributions with extensive experiments on a variety of tasks and datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题