论文标题

在双曲空间中可证明准确且可扩展的线性分类器

Provably Accurate and Scalable Linear Classifiers in Hyperbolic Spaces

论文作者

Pan, Chao, Chien, Eli, Tabaghi, Puoya, Peng, Jianhao, Milenkovic, Olgica

论文摘要

许多高维实际数据集具有图形或时间序列引起的层次结构。在欧几里得空间中很难处理此类数据集,并且经常在其他空间形式中寻求低维嵌入以执行所需的学习任务。对于层次数据,选择空间是一个双曲线空间,因为它可以保证类似树状结构的低度嵌入。双曲线空间的几何形状具有在试图严格分析算法溶液时构成挑战的欧几里得空间中未遇到的特性。我们提出了一个统一的框架,用于学习可扩展和简单的双曲线线性分类器,并具有可证明的性能保证。我们方法的要旨是专注于庞加莱球模型,并使用切线空间形式主义提出分类问题。我们的结果包括一种新的双曲线感知算法,以及用于双曲线支持向量机分类器的有效且高度准确的凸优化设置。此外,我们适应了适应二阶感知的方法,其中数据是根据二阶信息(相关)进行预处理的,以加速收敛和战略性感知,其中可能以在线方式和决策进行了依据。庞加莱二阶和战略性感知的出色表现表明,所提出的框架可以扩展到双曲线空间中的一般机器学习问题。我们的实验结果,与合成的单细胞RNA-seq表达测量有关,CIFAR10,时尚 - 纳斯特和迷你imagenet,确定所有算法证明所有算法都可以融合,并且具有与其欧几里得对应物的算法相当的复杂性。随附的代码可以在以下网址找到:https://github.com/thupchnsky/poincarelinearclassification。

Many high-dimensional practical data sets have hierarchical structures induced by graphs or time series. Such data sets are hard to process in Euclidean spaces and one often seeks low-dimensional embeddings in other space forms to perform the required learning tasks. For hierarchical data, the space of choice is a hyperbolic space because it guarantees low-distortion embeddings for tree-like structures. The geometry of hyperbolic spaces has properties not encountered in Euclidean spaces that pose challenges when trying to rigorously analyze algorithmic solutions. We propose a unified framework for learning scalable and simple hyperbolic linear classifiers with provable performance guarantees. The gist of our approach is to focus on Poincaré ball models and formulate the classification problems using tangent space formalisms. Our results include a new hyperbolic perceptron algorithm as well as an efficient and highly accurate convex optimization setup for hyperbolic support vector machine classifiers. Furthermore, we adapt our approach to accommodate second-order perceptrons, where data is preprocessed based on second-order information (correlation) to accelerate convergence, and strategic perceptrons, where potentially manipulated data arrives in an online manner and decisions are made sequentially. The excellent performance of the Poincaré second-order and strategic perceptrons shows that the proposed framework can be extended to general machine learning problems in hyperbolic spaces. Our experimental results, pertaining to synthetic, single-cell RNA-seq expression measurements, CIFAR10, Fashion-MNIST and mini-ImageNet, establish that all algorithms provably converge and have complexity comparable to those of their Euclidean counterparts. Accompanying codes can be found at: https://github.com/thupchnsky/PoincareLinearClassification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源