论文标题
通过子图抽样估计图案:第四矩现象
Motif Estimation via Subgraph Sampling: The Fourth Moment Phenomenon
论文作者
论文摘要
网络采样是一种必不可少的工具,可用于了解大型复杂网络的功能,在整个图形上几乎无法搜索。在本文中,我们在广泛使用的子图采样模型中开发了一个用于计数网络基序的统计推断,例如边缘,三角形和楔子,其中每个顶点都是独立采样的,并且观察到了采样的顶点引起的子图。我们为天然Horvitz-Thompson(HT)估计量的一致性和渐近正态性提供了必要和充分的条件,该估计量可用于构建基于采样图的基序计数的置信区间和假设测试。特别是,我们表明,HT估计量的渐近正态性表现出有趣的第四序现象,该现象断言HT估计量(适当居中和重新缩放)在分布中会收敛到标准正常时,每当其第四矩将其转化为3(标准正态分布的第四段))。结果,我们得出了在各种自然图集合中HT估计器的一致性和渐近正态性的确切阈值,例如具有有界程度的稀疏图,Erdos-Renyi随机图,随机常规图和密集图形。
Network sampling is an indispensable tool for understanding features of large complex networks where it is practically impossible to search over the entire graph. In this paper, we develop a framework for statistical inference for counting network motifs, such as edges, triangles, and wedges, in the widely used subgraph sampling model, where each vertex is sampled independently, and the subgraph induced by the sampled vertices is observed. We derive necessary and sufficient conditions for the consistency and the asymptotic normality of the natural Horvitz-Thompson (HT) estimator, which can be used for constructing confidence intervals and hypothesis testing for the motif counts based on the sampled graph. In particular, we show that the asymptotic normality of the HT estimator exhibits an interesting fourth-moment phenomenon, which asserts that the HT estimator (appropriately centered and rescaled) converges in distribution to the standard normal whenever its fourth-moment converges to 3 (the fourth-moment of the standard normal distribution). As a consequence, we derive the exact thresholds for consistency and asymptotic normality of the HT estimator in various natural graph ensembles, such as sparse graphs with bounded degree, Erdos-Renyi random graphs, random regular graphs, and dense graphons.