论文标题
冲突强度的序数潜在变量模型
An Ordinal Latent Variable Model of Conflict Intensity
论文作者
论文摘要
衡量事件的强度对于监视和跟踪武装冲突至关重要。自动化事件提取的进步产生了大量的数据集,这些数据集的“谁对谁做了谁”的微观记录,以实现数据驱动的方法来监视冲突。 Goldstein量表是一项基于专家的广泛方法,可以在冲突的合作量表上得分。它仅基于动作类别(“什么”),而无视事件的主题(“ who”)和对象(“向谁”)以及上下文信息,例如相关的伤亡人数,这些信息应该有助于对事件的“强度”的感知。本文采用一种基于潜在变量的方法来衡量冲突强度。我们引入了一个概率生成模型,该模型假设每个观察到的事件都与潜在强度类别相关联。该模型的一个新方面是,它在类上施加订购,因此高价值的类表示强度较高。潜在变量的序数性质是从数据的自然有序方面(例如伤亡计数)引起的,其中较高的值自然表明强度更高。我们在本质和外部评估了所提出的模型,表明它获得了相对良好的预测性能。
Measuring the intensity of events is crucial for monitoring and tracking armed conflict. Advances in automated event extraction have yielded massive data sets of "who did what to whom" micro-records that enable data-driven approaches to monitoring conflict. The Goldstein scale is a widely-used expert-based measure that scores events on a conflictual-cooperative scale. It is based only on the action category ("what") and disregards the subject ("who") and object ("to whom") of an event, as well as contextual information, like associated casualty count, that should contribute to the perception of an event's "intensity". This paper takes a latent variable-based approach to measuring conflict intensity. We introduce a probabilistic generative model that assumes each observed event is associated with a latent intensity class. A novel aspect of this model is that it imposes an ordering on the classes, such that higher-valued classes denote higher levels of intensity. The ordinal nature of the latent variable is induced from naturally ordered aspects of the data (e.g., casualty counts) where higher values naturally indicate higher intensity. We evaluate the proposed model both intrinsically and extrinsically, showing that it obtains comparatively good held-out predictive performance.