对社区数量的最佳估计

论文标题

对社区数量的最佳估计

Optimal Estimation of the Number of Communities

论文作者

Jin, Jiashun, Ke, Zheng Tracy, Luo, Shengming, Wang, Minzhe

论文摘要

在网络分析中，如何估计$ k $的社区数量是一个基本问题。我们考虑一个广泛的环境，我们允许严重的学位异质性和广泛的稀疏度，并提出逐步拟合优点（STGOF）作为一种新方法。这是一种逐步的算法，其中$ m = 1，2，\ ldots $，我们或者另一种使用社区检测步骤和合适性（GOF）步骤。我们适应社区检测的分数\ cite {Score}，并提出了一个新的GOF度量。我们表明，在步骤$ m $的情况下，GOF度量的所有$ m <k $的概率分别为$ \ infty $，如果$ m = k $，则收集到$ n（0,1）$。这产生了$ K $的一致估计。另外，我们发现定义我们问题的信噪比（SNR）的正确方法，并表明，如果$ \ mathrm {snr} \ goto 0 $，$ k $的一致估计不存在，而sTGOF均匀地一致，则对于$ k $，如果$ \ \ m athrm {snr {snr} \ goto {snr} \ goto {snr} \ goto \ infty。因此，STGOF实现了最佳相变。已知类似的逐步方法（例如，\ cite {wang2017likelihood，ma2018否定}）面临分析挑战。我们通过在STGOF中使用不同的逐步方案来克服挑战，并得出以前无法使用的尖锐结果。我们分析的关键是表明得分具有{\ it non-Splitting属性（NSP）}。主要是由于Davis-Kahan $ \ sin（θ）$定理所决定的特征向量的不可扣除，因此NSP是不宽松的，无法证明并需要我们开发的新技术。

In network analysis, how to estimate the number of communities $K$ is a fundamental problem. We consider a broad setting where we allow severe degree heterogeneity and a wide range of sparsity levels, and propose Stepwise Goodness-of-Fit (StGoF) as a new approach. This is a stepwise algorithm, where for $m = 1, 2, \ldots$, we alternately use a community detection step and a goodness-of-fit (GoF) step. We adapt SCORE \cite{SCORE} for community detection, and propose a new GoF metric. We show that at step $m$, the GoF metric diverges to $\infty$ in probability for all $m < K$ and converges to $N(0,1)$ if $m = K$. This gives rise to a consistent estimate for $K$. Also, we discover the right way to define the signal-to-noise ratio (SNR) for our problem and show that consistent estimates for $K$ do not exist if $\mathrm{SNR} \goto 0$, and StGoF is uniformly consistent for $K$ if $\mathrm{SNR} \goto \infty$. Therefore, StGoF achieves the optimal phase transition. Similar stepwise methods (e.g., \cite{wang2017likelihood, ma2018determining}) are known to face analytical challenges. We overcome the challenges by using a different stepwise scheme in StGoF and by deriving sharp results that are not available before. The key to our analysis is to show that SCORE has the {\it Non-Splitting Property (NSP)}. Primarily due to a non-tractable rotation of eigenvectors dictated by the Davis-Kahan $\sin(θ)$ theorem, the NSP is non-trivial to prove and requires new techniques we develop.

下载PDF全文

下载文献需遵守相关版权规定

论文标题