论文标题
改进了对最近邻居分类的相关点的搜索
Improved Search of Relevant Points for Nearest-Neighbor Classification
论文作者
论文摘要
给定培训集$ p \ subset \ mathbb {r}^d $,最近的neighbor分类器将任何查询点$ q \ in \ mathbb {r}^d $ in \ p $ in \ mathbb {r}^d $ in \ mathbb {r}^d $分配。为了回答这些分类查询,某些培训点比其他培训点更有意义。我们说,如果训练集的省略可能会导致$ \ mathbb {r}^d $中的某个查询点的错误分类,则训练点很重要。这些相关点通常被称为边界点,因为它们定义了$ p $的Voronoi图的边界,这些范围是不同类别的分开点。能够有效地计算这组点对于减少训练组的大小至关重要,而不会影响最近的邻居分类器的准确性。 克拉克森(Clarkson)长达数十年的结果改善了,在埃普斯坦(Eppstein)最近的一篇论文中,提出了对输出敏感算法的算法,以找到$ o(n^2 + nk^2)$ time的$ p $ $ p $的一组,其中$ k $是该集合的大小。在本文中,我们通过证明其算法的第一步(n^2)$时间是不必要的。
Given a training set $P \subset \mathbb{R}^d$, the nearest-neighbor classifier assigns any query point $q \in \mathbb{R}^d$ to the class of its closest point in $P$. To answer these classification queries, some training points are more relevant than others. We say a training point is relevant if its omission from the training set could induce the misclassification of some query point in $\mathbb{R}^d$. These relevant points are commonly known as border points, as they define the boundaries of the Voronoi diagram of $P$ that separate points of different classes. Being able to compute this set of points efficiently is crucial to reduce the size of the training set without affecting the accuracy of the nearest-neighbor classifier. Improving over a decades-long result by Clarkson, in a recent paper by Eppstein an output-sensitive algorithm was proposed to find the set of border points of $P$ in $O( n^2 + nk^2 )$ time, where $k$ is the size of such set. In this paper, we improve this algorithm to have time complexity equal to $O( nk^2 )$ by proving that the first steps of their algorithm, which require $O( n^2 )$ time, are unnecessary.