在随机扰动下最近的邻居算法的预测能力

论文标题

在随机扰动下最近的邻居算法的预测能力

Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

论文作者

Xing, Yue, Song, Qifan, Cheng, Guang

论文摘要

我们考虑了最近的$ k $最近邻居（$ k $ -nn）算法中的数据损坏方案，即，测试数据被随机扰动。在这种情况下，腐败水平对渐近遗憾的影响得到了仔细的特征。特别是，我们的理论分析揭示了一种相过渡现象，当腐败级别$ω$低于关键顺序（即小$ω$制度）时，渐近遗憾保持不变；当超出该秩序（即大$ω$制度）时，渐近遗憾会在多项式上恶化。令人惊讶的是，我们获得了一个负面的结果，即即使在渐近性遗憾的乘法常数的水平，经典的噪声注射方法也不会帮助提高测试性能。作为技术副产品，我们证明，在不同的模型假设下，在\ cite {XUE2017ACHIEVING}中提出的预处理的1-NN}最多将达到次优率，即使数据维度$ d> 4 $即使在预处理步骤中最佳选择$ k $也是最佳选择的。

We consider a data corruption scenario in the classical $k$ Nearest Neighbors ($k$-NN) algorithm, that is, the testing data are randomly perturbed. Under such a scenario, the impact of corruption level on the asymptotic regret is carefully characterized. In particular, our theoretical analysis reveals a phase transition phenomenon that, when the corruption level $ω$ is below a critical order (i.e., small-$ω$ regime), the asymptotic regret remains the same; when it is beyond that order (i.e., large-$ω$ regime), the asymptotic regret deteriorates polynomially. Surprisingly, we obtain a negative result that the classical noise-injection approach will not help improve the testing performance in the beginning stage of the large-$ω$ regime, even in the level of the multiplicative constant of asymptotic regret. As a technical by-product, we prove that under different model assumptions, the pre-processed 1-NN proposed in \cite{xue2017achieving} will at most achieve a sub-optimal rate when the data dimension $d>4$ even if $k$ is chosen optimally in the pre-processing step.

下载PDF全文

下载文献需遵守相关版权规定

论文标题