半监督分类的深度低密度分离

论文标题

半监督分类的深度低密度分离

Deep Low-Density Separation for Semi-Supervised Classification

论文作者

Burkhart, Michael C., Shan, Kyle

论文摘要

给定一组标记的数据和一组未标记的数据，半监督学习（SSL）试图利用未标记的数据标记的位置，以创建比仅应用于标签训练集的监督方法获得的更好的分类器。有效的SSL对数据施加结构性假设，例如邻居更有可能共享分类，或者决策界限在于低密度的区域。对于复杂且高维数据，神经网络可以学习特征嵌入，然后可以将传统的SSL方法应用于我们称为混合方法中。以前开发的混合方法在完善潜在表示和在此表示上执行基于图的SSL之间进行迭代。在本文中，我们引入了一种新型的混合方法，该方法将低密度分离应用于嵌入式特征。我们详细描述它，并讨论为什么低密度分离可能比基于图基的算法更适合于基于神经网络的嵌入。我们使用内部客户调查数据验证我们的方法，并将其与其他最先进的学习方法进行比较。我们的方法有效地将数千名未标记的用户从相对少量的手工分类示例分类。

Given a small set of labeled data and a large set of unlabeled data, semi-supervised learning (SSL) attempts to leverage the location of the unlabeled datapoints in order to create a better classifier than could be obtained from supervised methods applied to the labeled training set alone. Effective SSL imposes structural assumptions on the data, e.g. that neighbors are more likely to share a classification or that the decision boundary lies in an area of low density. For complex and high-dimensional data, neural networks can learn feature embeddings to which traditional SSL methods can then be applied in what we call hybrid methods. Previously-developed hybrid methods iterate between refining a latent representation and performing graph-based SSL on this representation. In this paper, we introduce a novel hybrid method that instead applies low-density separation to the embedded features. We describe it in detail and discuss why low-density separation may be better suited for SSL on neural network-based embeddings than graph-based algorithms. We validate our method using in-house customer survey data and compare it to other state-of-the-art learning methods. Our approach effectively classifies thousands of unlabeled users from a relatively small number of hand-classified examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题