论文标题
同时对聚类和回归的半参数估计
Simultaneous semi-parametric estimation of clustering and regression
论文作者
论文摘要
我们研究具有固定组效应的回归模型的参数估计,当群体变量丢失时,在相关变量可用时。此问题涉及聚类以根据相关变量推断丢失的组变量,并在给定组和最终其他变量上对目标变量建立模型的回归。因此,该问题可以作为目标和相关变量的联合分布建模。该联合模型的通常参数估计策略是一种两步方法,首先要学习组变量(聚类步骤),然后插入其估计器以拟合回归模型(回归步骤)。但是,这种方法是次优的(特别是提供偏见的回归估计值),因为它不利用目标变量进行聚类。因此,我们声称在半参数框架中同时估算聚类和回归方法。数值实验通过考虑分布和回归模型的广泛范围来说明我们主张的好处。我们的新方法的相关性在处理与预防高血压相关的问题的真实数据上进行了说明。
We investigate the parameter estimation of regression models with fixed group effects, when the group variable is missing while group related variables are available. This problem involves clustering to infer the missing group variable based on the group related variables, and regression to build a model on the target variable given the group and eventually additional variables. Thus, this problem can be formulated as the joint distribution modeling of the target and of the group related variables. The usual parameter estimation strategy for this joint model is a two-step approach starting by learning the group variable (clustering step) and then plugging in its estimator for fitting the regression model (regression step). However, this approach is suboptimal (providing in particular biased regression estimates) since it does not make use of the target variable for clustering. Thus, we claim for a simultaneous estimation approach of both clustering and regression, in a semi-parametric framework. Numerical experiments illustrate the benefits of our proposition by considering wide ranges of distributions and regression models. The relevance of our new method is illustrated on real data dealing with problems associated with high blood pressure prevention.