论文标题
高阶准确的两样本网络推断和网络哈希
Higher-order accurate two-sample network inference and network hashing
论文作者
论文摘要
网络比较的两样本假设检验提出了许多重大挑战,包括:利用重复的网络观察和已知的节点注册,但不需要它们操作;放松强大的结构假设;实现有限样本的高阶精度;处理不同的网络大小和稀疏度;快速计算和记忆简约;在多次测试中控制虚假发现率(FDR);以及理论上的理解,尤其是关于有限样本的准确性和最小值最佳性。在本文中,我们开发了一个全面的工具箱,其中包含一种新颖的主要方法及其变体,均伴随着强有力的理论保证,以应对这些挑战。我们的方法的速度和准确性优于现有工具,并且证明是最佳的。我们的算法在处理各种数据结构(单个或重复网络观察结果;已知或未知节点注册)方面是用户友好且通用的。我们还开发了一个创新的框架,用于离线哈希和快速查询,作为大型网络数据库的非常有用的工具。我们通过对两个现实世界数据集的全面仿真和应用来展示我们方法的有效性,这些数据集揭示了有趣的新结构。
Two-sample hypothesis testing for network comparison presents many significant challenges, including: leveraging repeated network observations and known node registration, but without requiring them to operate; relaxing strong structural assumptions; achieving finite-sample higher-order accuracy; handling different network sizes and sparsity levels; fast computation and memory parsimony; controlling false discovery rate (FDR) in multiple testing; and theoretical understandings, particularly regarding finite-sample accuracy and minimax optimality. In this paper, we develop a comprehensive toolbox, featuring a novel main method and its variants, all accompanied by strong theoretical guarantees, to address these challenges. Our method outperforms existing tools in speed and accuracy, and it is proved power-optimal. Our algorithms are user-friendly and versatile in handling various data structures (single or repeated network observations; known or unknown node registration). We also develop an innovative framework for offline hashing and fast querying as a very useful tool for large network databases. We showcase the effectiveness of our method through comprehensive simulations and applications to two real-world datasets, which revealed intriguing new structures.