论文标题

通过密度曲率特征凸起狩猎

Bump hunting through density curvature features

论文作者

Chacón, José E., Serrano, Javier Fernández

论文摘要

bump狩猎与样本空间中的发现有意义的数据子集,称为颠簸。这些传统上被认为是基础密度函数图中的模态或凹区域。我们根据概率密度的曲率功能定义抽象的凸起构建体。然后,我们探讨了几种涉及衍生物至二阶的替代特征。特别是,在多元案例中提出了适当的善良和加斯金斯原始凹陷的实施。此外,我们将探索性数据分析概念(如平均曲率和拉普拉斯人)在应用领域产生了良好的结果。我们的方法可以通过插件内核密度估计器来解决曲率功能的近似。我们提供理论上的结果,以确保在Hausdorff距离内的凸界边界的渐近一致性,并且负担得起的收敛速度。我们还提出了渐近有效且一致的置信区域边界曲率凸起。通过来自NBA,MLB和NFL的数据集的体育分析中的几种用例来说明该理论。我们得出的结论是,不同的曲率实例有效地结合在一起,产生有见地的可视化。

Bump hunting deals with finding in sample spaces meaningful data subsets known as bumps. These have traditionally been conceived as modal or concave regions in the graph of the underlying density function. We define an abstract bump construct based on curvature functionals of the probability density. Then, we explore several alternative characterizations involving derivatives up to second order. In particular, a suitable implementation of Good and Gaskins' original concave bumps is proposed in the multivariate case. Moreover, we bring to exploratory data analysis concepts like the mean curvature and the Laplacian that have produced good results in applied domains. Our methodology addresses the approximation of the curvature functional with a plug-in kernel density estimator. We provide theoretical results that assure the asymptotic consistency of bump boundaries in the Hausdorff distance with affordable convergence rates. We also present asymptotically valid and consistent confidence regions bounding curvature bumps. The theory is illustrated through several use cases in sports analytics with datasets from the NBA, MLB and NFL. We conclude that the different curvature instances effectively combine to generate insightful visualizations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源