论文标题
在儿童癌症幸存者中建模生育潜力:现代统计和计算方法简介
Modelling fertility potential in survivors of childhood cancer: An introduction to modern statistical and computational methods
论文作者
论文摘要
统计和计算方法在当今的科学研究中广泛使用。以儿童癌症幸存者的女性生育能力为例,我们说明了如何使用这些方法从嘈杂的观察数据中提取有关生物过程的见解,以便为决策提供信息。我们首先以一个工作示例将计算方法上下文化:在女性儿童癌症幸存者中,急性卵巢衰竭风险的建模,以量化由于暴露于救生寿命但仍有有毒癌症治疗而导致的永久卵巢衰竭风险。接下来是对分类问题一般框架的描述。我们提供了示例中使用的建模算法的概述,包括一种经典模型(逻辑回归)和两种流行的现代学习方法(随机森林和支持向量机)。使用工作示例,我们显示了用于建模的数据准备的一般步骤,经典模型的可变选择步骤以及如何利用可视化工具改善模型性能。我们以关于模型评估的重要性的注释结束。
Statistical and computational methods are widely used in today's scientific studies. Using a female fertility potential in childhood cancer survivors as an example, we illustrate how these methods can be used to extract insight regarding biological processes from noisy observational data in order to inform decision making. We start by contextualizing the computational methods with the working example: the modelling of acute ovarian failure risk in female childhood cancer survivors to quantify the risk of permanent ovarian failure due to exposure to lifesaving but nonetheless toxic cancer treatments. This is followed by a description of the general framework of classification problems. We provide an overview of the modelling algorithms employed in our example, including one classic model (logistic regression) and two popular modern learning methods (random forest and support vector machines). Using the working example, we show the general steps of data preparation for modelling, variable selection steps for the classic model, and how model performance might be improved utilizing visualization tools. We end with a note on the importance of model evaluation.