论文标题

使用多项式逻辑回归开发多变量预测模型的最小样本量

Minimum Sample Size for Developing a Multivariable Prediction Model using Multinomial Logistic Regression

论文作者

Pate, Alexander, Riley, Richard D, Collins, Gary S, van Smeden, Maarten, Van Calster, Ben, Ensor, Joie, Martin, Glen P

论文摘要

多项式逻辑回归模型允许人们预测具有2个以上类别的分类结果的风险。在开发这种模型时,研究人员应确保相对于事件数量(E.K)和每个类别k的预测变量参数(P.K)的参与者数量(n)。我们提出了三个标准,以确定为二元结果开发的现有标准所需的最低n。第一个标准旨在最大​​程度地减少模型过度拟合。第二个目的是最大程度地减少观察到的R2 Nagelkerke之间的差异。第三个标准旨在确保准确估计总体风险。对于标准(i),我们显示样本量必须基于与多项式逻辑回归的子模型相对应的不同一对一的逻辑回归模型的预期COX-SNELL R2,而不是基于多项逻辑回归的整体Cox-Snell R2。我们通过模拟研究测试了提出的标准(i)的性能,并发现它导致了所需的过度拟合水平。标准(II)和(III)是从先前提出的二元结果标准的自然扩展。我们说明了如何通过一个有效的示例来实施样本量标准,考虑到具有卵巢质量时肿瘤类型的多项式风险预测模型的发展。为仿真和工作示例提供了代码。我们将在PMSampsize r库和Stata模块中嵌入我们的建议标准。

Multinomial logistic regression models allow one to predict the risk of a categorical outcome with more than 2 categories. When developing such a model, researchers should ensure the number of participants (n) is appropriate relative to the number of events (E.k) and the number of predictor parameters (p.k) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes. The first criteria aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R2 of distinct one-to-one logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R2 of the multinomial logistic regression. We tested the performance of the proposed criteria (i) through a simulation study, and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) are natural extensions from previously proposed criteria for binary outcomes. We illustrate how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源