论文标题
在受惩罚的花纹中的平滑参数的自动搜索间隔
Automatic Search Intervals for the Smoothing Parameter in Penalized Splines
论文作者
论文摘要
平滑参数的选择对于惩罚细条的估计至关重要。平滑参数的最佳价值通常是优化平滑度选择标准的最佳值,例如广义交叉验证误差(GCV)和受限的可能性(REML)。要正确识别全局最优,而不是被困在不需要的本地最佳最佳中,建议进行网格搜索以进行优化。不幸的是,网格搜索方法需要一个预先指定的搜索间隔,其中包含未知的全局最佳选择,但没有指南可用于提供此间隔。结果,从业人员必须通过反复试验找到它。为了克服这种困难,我们开发了新颖的算法以自动找到此间隔。我们的自动搜索间隔具有四个优点。 (i)它指定了平滑参数范围,其中相关的惩罚最小二乘问题在数值上是可以解决的。 (ii)它是与标准无关的,因此可以在相同的参数范围内探索不同的标准,例如GCV和REML。 (iii)足够宽,可以包含任何标准的全局最佳,因此,可以识别出GCV的全局最小值和REML的全局最大值。 (iv)与网格搜索本身相比,它在计算上便宜,在实践中没有额外的计算负担。我们的方法可以通过我们最近开发的R软件包GPS(> = = 1.1)使用。它可以嵌入依赖于惩罚的细条的更先进的统计建模方法中。
The selection of smoothing parameter is central to the estimation of penalized splines. The best value of the smoothing parameter is often the one that optimizes a smoothness selection criterion, such as generalized cross-validation error (GCV) and restricted likelihood (REML). To correctly identify the global optimum rather than being trapped in an undesired local optimum, grid search is recommended for optimization. Unfortunately, the grid search method requires a pre-specified search interval that contains the unknown global optimum, yet no guideline is available for providing this interval. As a result, practitioners have to find it by trial and error. To overcome such difficulty, we develop novel algorithms to automatically find this interval. Our automatic search interval has four advantages. (i) It specifies a smoothing parameter range where the associated penalized least squares problem is numerically solvable. (ii) It is criterion-independent so that different criteria, such as GCV and REML, can be explored on the same parameter range. (iii) It is sufficiently wide to contain the global optimum of any criterion, so that for example, the global minimum of GCV and the global maximum of REML can both be identified. (iv) It is computationally cheap compared with the grid search itself, carrying no extra computational burden in practice. Our method is ready to use through our recently developed R package gps (>= version 1.1). It may be embedded in more advanced statistical modeling methods that rely on penalized splines.