论文标题

强烈的本地超晶扩散,用于聚类和半监督学习

Strongly Local Hypergraph Diffusions for Clustering and Semi-supervised Learning

论文作者

Liu, Meng, Veldt, Nate, Song, Haoyu, Li, Pan, Gleich, David F.

论文摘要

现在,基于超图的机器学习方法被广泛认为对于建模和使用数据对象之间的高阶和多路关系很重要。局部的超图聚类和半监督学习专门涉及在一组标记的顶点附近找到一组良好的节点。尽管存在许多用于图的局部聚类的方法,但对于超图中的局部聚类,相对较少。此外,存在的人通常缺乏对一般的HyperGraph削减功能建模或无法扩展到大问题的灵活性。为了解决这些问题,本文提出了一种新的基于扩散的超颗粒聚类算法,该算法求解了基于二次超图切割的物镜,类似于Andersen-Chung-chung-lang个性化的Pagerank聚类的HyperGraph类似物。我们证明,对于具有固定最大高度尺寸的图形,此方法是局部局部的,这意味着其运行时仅取决于输出的大小,而不是超图的大小,并且高度可扩展。此外,我们的方法使我们能够使用多种基于基于基础的高雕刻功能来计算。我们还证明,通过解决新的目标函数找到的群集可以满足类似于Cheger的质量保证。我们证明,在大型现实世界中,我们的新方法可以找到更好的簇,并且运行速度比现有方法快得多。具体而言,与基于流量的技术分钟相比,它的超图几秒钟的超图可在几秒钟内进行。我们此外表明,我们的框架足够一般,也可以用于解决其他基于P-Norm的剪裁目标。我们的代码可用\ url {github.com/mengliupurdue/lhqd}。

Hypergraph-based machine learning methods are now widely recognized as important for modeling and using higher-order and multiway relationships between data objects. Local hypergraph clustering and semi-supervised learning specifically involve finding a well-connected set of nodes near a given set of labeled vertices. Although many methods for local clustering exist for graphs, there are relatively few for localized clustering in hypergraphs. Moreover, those that exist often lack flexibility to model a general class of hypergraph cut functions or cannot scale to large problems. To tackle these issues, this paper proposes a new diffusion-based hypergraph clustering algorithm that solves a quadratic hypergraph cut based objective akin to a hypergraph analog of Andersen-Chung-Lang personalized PageRank clustering for graphs. We prove that, for graphs with fixed maximum hyperedge size, this method is strongly local, meaning that its runtime only depends on the size of the output instead of the size of the hypergraph and is highly scalable. Moreover, our method enables us to compute with a wide variety of cardinality-based hypergraph cut functions. We also prove that the clusters found by solving the new objective function satisfy a Cheeger-like quality guarantee. We demonstrate that on large real-world hypergraphs our new method finds better clusters and runs much faster than existing approaches. Specifically, it runs in few seconds for hypergraphs with a few million hyperedges compared with minutes for flow-based technique. We furthermore show that our framework is general enough that can also be used to solve other p-norm based cut objectives on hypergraphs. Our code is available \url{github.com/MengLiuPurdue/LHQD}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源