用条件支持点的密度回归

论文标题

用条件支持点的密度回归

Density Regression with Conditional Support Points

论文作者

Chen, Yunlu, Zhang, Nan

论文摘要

密度回归表征了给定协变量的响应变量的条件密度，并提供了比常用的条件均值或分数回归的信息更多的信息。但是，在具有大量数据集的应用中，它通常在计算上是过度的，尤其是当有多个协变量时。在本文中，我们使用条件支持点为密度回归问题开发了一种新的数据减少方法。在获得代表性数据后，我们将惩罚的可能性方法作为下游估计策略。基于连续排名的概率得分之间的连接，能量距离，$ L_2 $差异和对称的kullback-leibler距离，我们研究了代表点的分布收敛，并确定了密度回归估计器的收敛速率。通过使用大型风力涡轮机数据集建模了给定功率输出的条件分布，可以说明该方法的有用性。本文的补充材料可在线获得。

Density regression characterizes the conditional density of the response variable given the covariates, and provides much more information than the commonly used conditional mean or quantile regression. However, it is often computationally prohibitive in applications with massive data sets, especially when there are multiple covariates. In this paper, we develop a new data reduction approach for the density regression problem using conditional support points. After obtaining the representative data, we exploit the penalized likelihood method as the downstream estimation strategy. Based on the connections among the continuous ranked probability score, the energy distance, the $L_2$ discrepancy and the symmetrized Kullback-Leibler distance, we investigate the distributional convergence of the representative points and establish the rate of convergence of the density regression estimator. The usefulness of the methodology is illustrated by modeling the conditional distribution of power output given multivariate environmental factors using a large scale wind turbine data set. Supplementary materials for this article are available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题