论文标题
离散数据回归的模型诊断:使用功能残差的统一框架
Model diagnostics of discrete data regression: a unifying framework using functional residuals
论文作者
论文摘要
模型诊断是回归分析的必不可少的组成部分,但在广义线性模型的标准教科书中并没有很好地解决它。缺乏博览会归因于以下事实:当结果数据是离散的,经典方法(例如,皮尔逊/偏差剩余分析和合适的测试)在模型诊断和治疗方面的效用有限。本文为离散数据回归的模型诊断建立了一个新颖的框架。与将单值数量定义为残差的文献不同,我们建议将函数用作保留剩余信息的载体。在存在离散性的情况下,我们表明这种功能残差适用于总结模型结构部分无法捕获的残留随机性。我们建立了其理论属性,该特性导致了新的诊断工具的创新,包括功能性 - 残基VS协变量图和功能功能功能(FN-FN)图。我们的数值研究表明,这些工具的使用可以揭示多种模型错误的特异性,例如不正确的包括高阶项,解释性变量,交互作用效应,分散参数或零通气成分。作为副产品,功能残留的产量,刘张的替代残留物主要用于用于序数数据的累积链路模型(Liu和Zhang,2018年,JASA)。作为一般概念,它大大扩展了诊断范围,因为它适用于二进制,序数和计数数据的几乎所有参数模型,均在统一的诊断方案中。
Model diagnostics is an indispensable component of regression analysis, yet it is not well addressed in standard textbooks on generalized linear models. The lack of exposition is attributed to the fact that when outcome data are discrete, classical methods (e.g., Pearson/deviance residual analysis and goodness-of-fit tests) have limited utility in model diagnostics and treatment. This paper establishes a novel framework for model diagnostics of discrete data regression. Unlike the literature defining a single-valued quantity as the residual, we propose to use a function as a vehicle to retain the residual information. In the presence of discreteness, we show that such a functional residual is appropriate for summarizing the residual randomness that cannot be captured by the structural part of the model. We establish its theoretical properties, which leads to the innovation of new diagnostic tools including the functional-residual-vs covariate plot and Function-to-Function (Fn-Fn) plot. Our numerical studies demonstrate that the use of these tools can reveal a variety of model misspecifications, such as not properly including a higher-order term, an explanatory variable, an interaction effect, a dispersion parameter, or a zero-inflation component. The functional residual yields, as a byproduct, Liu-Zhang's surrogate residual mainly developed for cumulative link models for ordinal data (Liu and Zhang, 2018, JASA). As a general notion, it considerably broadens the diagnostic scope as it applies to virtually all parametric models for binary, ordinal and count data, all in a unified diagnostic scheme.