论文标题
通过修改的ROC-GLM对预测模型的分布式非差异验证
Distributed non-disclosive validation of predictive models by a modified ROC-GLM
论文作者
论文摘要
分布式统计分析在分析在几个数据库上分布的数据时,为隐私保护提供了一种有希望的方法。它将分析带入数据,而不是数据到分析。分析师收到匿名摘要统计信息,这些统计数据合并到汇总结果。我们有兴趣根据分布式方法计算预测分数的AUC,而无需了解分布在不同数据库上的单个主题的数据。我们使用Datashield作为技术进行分布式分析,并使用新开发的算法来执行预测分数的验证。可以在分布式设置中轻松实现校准。但是,由各自的ROC曲线及其AUC代表的歧视具有挑战性。我们将方法基于ROC-GLM算法以及差异隐私的思想。在模拟研究中评估了所提出的算法。描述了一个现实词的应用:审计用例(医学信息学计划)的审计用例,目的是验证新诊断的多发性硬化症患者的治疗预测规则。
Distributed statistical analyses provide a promising approach for privacy protection when analysing data distributed over several databases. It brings the analysis to the data and not the data to the analysis. The analyst receives anonymous summary statistics which are combined to a aggregated result. We are interested to calculate the AUC of a prediction score based on a distributed approach without getting to know the data of involved individual subjects distributed over different databases. We use DataSHIELD as the technology to carry out distributed analyses and use a newly developed algorithms to perform the validation of the prediction score. Calibration can easily be implemented in the distributed setting. But, discrimination represented by a respective ROC curve and its AUC is challenging. We base our approach on the ROC-GLM algorithm as well as on ideas of differential privacy. The proposed algorithms are evaluated in a simulation study. A real-word application is described: The audit use case of DIFUTURE (Medical Informatics Initiative) with the goal to validate a treatment prediction rule of patients with newly diagnosed multiple sclerosis.