论文标题

缺少粘性通道的质量估计

Missing Mass Estimation from Sticky Channels

论文作者

Chandra, Prafulla, Thangaraj, Andrew, Rajaraman, Nived

论文摘要

最近,已经研究了以DNA计算等应用的激励,研究了以“粘性”通道建模为“粘性”通道的分布估计。缺少质量(缺少字母的概率)是一个重要数量,在分布估计中起着至关重要的作用,尤其是在大型字母制度中。在这项工作中,我们考虑了估计缺失质量的问题,在采样“粘性”的情况下,在独立且分布的(i.i.d)采样的情况下进行了充分研究。确切地说,我们考虑了以下情况,在该方案中,来自未知分布的每个样本都会重复几何分布的次数。我们表征了从这种粘性采样通道中估计缺失质量的平均误差(MSE)的最小值速率。最小值上的上限是通过界定修改良好曲线估计器的风险来获得的。我们通过扩展LE CAM方法来得出对最小速率的匹配下限。

Distribution estimation under error-prone or non-ideal sampling modelled as "sticky" channels have been studied recently motivated by applications such as DNA computing. Missing mass, the sum of probabilities of missing letters, is an important quantity that plays a crucial role in distribution estimation, particularly in the large alphabet regime. In this work, we consider the problem of estimation of missing mass, which has been well-studied under independent and identically distributed (i.i.d) sampling, in the case when sampling is "sticky". Precisely, we consider the scenario where each sample from an unknown distribution gets repeated a geometrically-distributed number of times. We characterise the minimax rate of Mean Squared Error (MSE) of estimating missing mass from such sticky sampling channels. An upper bound on the minimax rate is obtained by bounding the risk of a modified Good-Turing estimator. We derive a matching lower bound on the minimax rate by extending the Le Cam method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源