论文标题
使用协变量信息从文本数据中学习社交网络
Learning Social Networks from Text Data using Covariate Information
论文作者
论文摘要
描述和描述历史人物的影响可能是具有挑战性的,但是揭开其社会结构的影响也许更是如此。历史社会网络分析方法可以帮助,也可能阐明那些被历史学家忽视但事实证明是有影响力的社会联系点的人。文本数据(例如传记)可以是有关历史社交网络结构的有用信息来源,但也可以引入识别链接的挑战。当地的泊松图形套索模型利用文本中的委托数量来衡量人之间的关系,并使用条件独立性结构来建模社交网络。这种结构将减少夸大“朋友之友”之间关系的趋势,但是鉴于通用名称的历史高频,而没有其他区分信息,我们仍然可以引入不正确的链接。在这项工作中,我们使用(多个)惩罚结构扩展了本地泊松图形套索模型,该模型结合了协变量,从而增加了与共享协变量信息的人的链接概率。我们提出贪婪和贝叶斯的方法来估计罚款参数。我们介绍了模拟具有历史网络特征的数据的结果,并表明这种惩罚结构可以通过精确和召回来改善网络恢复。我们还说明了生活在现代英国早期的个人的传记数据的方法,其目标是1500年至1575年。
Describing and characterizing the impact of historical figures can be challenging, but unraveling their social structures perhaps even more so. Historical social network analysis methods can help and may also illuminate people who have been overlooked by historians but turn out to be influential social connection points. Text data, such as biographies, can be a useful source of information about the structure of historical social networks but can also introduce challenges in identifying links. The Local Poisson Graphical Lasso model leverages the number of co-mentions in the text to measure relationships between people and uses a conditional independence structure to model a social network. This structure will reduce the tendency to overstate the relationship between "friends of friends", but given the historical high frequency of common names, without additional distinguishing information, we can still introduce incorrect links. In this work, we extend the Local Poisson Graphical Lasso model with a (multiple) penalty structure that incorporates covariates giving increased link probabilities to people with shared covariate information. We propose both greedy and Bayesian approaches to estimate the penalty parameters. We present results on data simulated with characteristics of historical networks and show that this type of penalty structure can improve network recovery as measured by precision and recall. We also illustrate the approach on biographical data of individuals who lived in early modern Britain, targeting the period from 1500 to 1575.