论文标题
分数规范和准词无助于克服维度的诅咒
Fractional norms and quasinorms do not help to overcome the curse of dimensionality
论文作者
论文摘要
维度的诅咒引起了机器学习方法的知名且广泛讨论的问题。有一个假设,即使用曼哈顿距离甚至分数准菌(对于P小于1)可以帮助克服分类问题中维度的诅咒。在这项研究中,我们系统地检验了这一假设。我们确认,分数准菌菌具有比欧几里得Norm L2的相对对比度或变异系数更大,但是我们还证明,距离浓度在所有测试的规范和准菌群中都表现出定性的行为,并且它们之间的差异趋于衰减,因为它们趋于无限。基于不同规范和准毒素的KNN分类质量的估计表明,较大的相对对比并不意味着更好的分类器性能和不同数据库的最差性能是通过不同的规范(准词)显示的。系统比较表明,基于LP的p = 2、1和0.5的KNN的性能差在统计学上无关紧要。
The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a greater relative contrast or coefficient of variation than the Euclidean norm l2, but we also demonstrate that the distance concentration shows qualitatively the same behaviour for all tested norms and quasinorms and the difference between them decays as dimension tends to infinity. Estimation of classification quality for kNN based on different norms and quasinorms shows that a greater relative contrast does not mean better classifier performance and the worst performance for different databases was shown by different norms (quasinorms). A systematic comparison shows that the difference of the performance of kNN based on lp for p=2, 1, and 0.5 is statistically insignificant.