泊松多项式分布及其在投票理论，生态推断和机器学习中的应用

论文标题

泊松多项式分布及其在投票理论，生态推断和机器学习中的应用

The Poisson Multinomial Distribution and Its Applications in Voting Theory, Ecological Inference, and Machine Learning

论文作者

Lin, Zhengzhi, Wang, Yueyao, Hong, Yili

论文摘要

Poisson多项式分布（PMD）描述了$ n $独立但非相同分布的随机向量的分布，其中每个随机向量为长度为$ m $，带有0/1的值元素，只有一个元素可以具有一定概率的值1。对于$ n $随机向量的$ m $元素而言，这些概率有所不同，并形成$ n \ times m $矩阵，行总和等于1。我们称此$ n \ times m $矩阵为成功概率矩阵（SPM）。每个SPM唯一定义了PMD。 PMD在许多领域都有用，例如投票理论，生态推断和机器学习。但是，PMD的分布函数通常很难计算。在本文中，我们开发了使用多元傅立叶变换，正常近似和模拟来计算PMD的概率质量函数（PMF）的有效方法。我们研究这些方法的准确性和效率，并为在各种情况下使用哪些方法提出建议。我们还通过三个应用程序说明了PMD的使用，即投票概率计算，汇总数据推断和分类中的不确定性量化。我们构建一个R软件包，可实现所提出的方法，并用示例说明软件包。

The Poisson multinomial distribution (PMD) describes the distribution of the sum of $n$ independent but non-identically distributed random vectors, in which each random vector is of length $m$ with 0/1 valued elements and only one of its elements can take value 1 with a certain probability. Those probabilities are different for the $m$ elements across the $n$ random vectors, and form an $n \times m$ matrix with row sum equals to 1. We call this $n\times m$ matrix the success probability matrix (SPM). Each SPM uniquely defines a PMD. The PMD is useful in many areas such as, voting theory, ecological inference, and machine learning. The distribution functions of PMD, however, are usually difficult to compute. In this paper, we develop efficient methods to compute the probability mass function (pmf) for the PMD using multivariate Fourier transform, normal approximation, and simulations. We study the accuracy and efficiency of those methods and give recommendations for which methods to use under various scenarios. We also illustrate the use of the PMD via three applications, namely, in voting probability calculation, aggregated data inference, and uncertainty quantification in classification. We build an R package that implements the proposed methods, and illustrate the package with examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题