论文标题
超越准确性:使用预测在社交网络中估算同质性
Going beyond accuracy: estimating homophily in social networks using predictions
论文作者
论文摘要
在在线社交网络中,通常使用节点类别的预测来估计同质和其他关系属性的度量。但是,在线社交网络数据通常缺乏有关节点的基本人口统计信息。研究人员必须依靠预测的节点属性来估计同质的措施,但对这些措施的有效性知之甚少。我们表明,在网络中估算同质性可以看作是一个二元预测问题,并且当二元级残留物在网络中总计为零时,同质估计是公正的。节点级预测模型,例如使用名称对种族或性别进行分类,通常没有此属性,并且可以将大偏见引入同质估计中。由于沿二元组误差自相关而发生偏见。重要的是,节点级分类性能并不是同质估计准确性的可靠指标。我们比较在节点和二元级别进行预测的估计策略,从而评估不同设置的性能。我们提出了一种新颖的“自我改动”建模方法,该方法表现优于标准节点和二元分类策略。尽管本文着重于同质性,但结果概括为其他关系度量,这些措施汇总了网络中二元组的预测。我们最终提出了有关在线网络中同质研究的研究设计的建议。本文的代码可从https://github.com/georgeberry/autocorr获得。
In online social networks, it is common to use predictions of node categories to estimate measures of homophily and other relational properties. However, online social network data often lacks basic demographic information about the nodes. Researchers must rely on predicted node attributes to estimate measures of homophily, but little is known about the validity of these measures. We show that estimating homophily in a network can be viewed as a dyadic prediction problem, and that homophily estimates are unbiased when dyad-level residuals sum to zero in the network. Node-level prediction models, such as the use of names to classify ethnicity or gender, do not generally have this property and can introduce large biases into homophily estimates. Bias occurs due to error autocorrelation along dyads. Importantly, node-level classification performance is not a reliable indicator of estimation accuracy for homophily. We compare estimation strategies that make predictions at the node and dyad levels, evaluating performance in different settings. We propose a novel "ego-alter" modeling approach that outperforms standard node and dyad classification strategies. While this paper focuses on homophily, results generalize to other relational measures which aggregate predictions along the dyads in a network. We conclude with suggestions for research designs to study homophily in online networks. Code for this paper is available at https://github.com/georgeberry/autocorr.