论文标题
用于银河系特性的机器学习方法:与随机森林的联合红移 - 恒星质量概率分布
A machine learning approach to galaxy properties: joint redshift-stellar mass probability distributions with Random Forest
论文作者
论文摘要
我们证明,即使很少有光度频段,也可以使用随机森林(RF)机器学习(ML)算法获得高度准确的关节红移质量概率分布函数(PDF)。例如,我们使用深色能源调查(DES),并与Cosmos2015目录相结合,用于红移和恒星质量。我们构建了两个ML模型:一种在$ Griz $频段中包含深光度,第二个反映了主要DES调查中存在的光度散射,每种情况下都经过精心构建的代表性训练数据。我们通过利用Copula概率积分转换和Kendall分布函数及其单变量对应物来验证$ 10,699 $测试星系的联合PDFS验证边缘。我们基于ML的方法的基本设置与模板拟合代码风笛的基本设置的基准测试优于模板在我们所有预定义的性能指标上的拟合。除了准确性外,RF非常快,能够使用消费计算机硬件在不到6美元的$ 6美元中计算一百万个星系的关节PDF。这种速度使PDF可以在分析代码中实时得出,从而解决潜在的存储问题。作为这项工作的一部分,我们开发了Galpro,Galpro是一种高度直观,高效的Python软件包,以快速生成多元PDF。已记录了Galpro并供研究人员用于其宇宙学和星系进化研究。
We demonstrate that highly accurate joint redshift-stellar mass probability distribution functions (PDFs) can be obtained using the Random Forest (RF) machine learning (ML) algorithm, even with few photometric bands available. As an example, we use the Dark Energy Survey (DES), combined with the COSMOS2015 catalogue for redshifts and stellar masses. We build two ML models: one containing deep photometry in the $griz$ bands, and the second reflecting the photometric scatter present in the main DES survey, with carefully constructed representative training data in each case. We validate our joint PDFs for $10,699$ test galaxies by utilizing the copula probability integral transform and the Kendall distribution function, and their univariate counterparts to validate the marginals. Benchmarked against a basic set-up of the template-fitting code BAGPIPES, our ML-based method outperforms template fitting on all of our predefined performance metrics. In addition to accuracy, the RF is extremely fast, able to compute joint PDFs for a million galaxies in just under $6$ min with consumer computer hardware. Such speed enables PDFs to be derived in real time within analysis codes, solving potential storage issues. As part of this work we have developed GALPRO, a highly intuitive and efficient Python package to rapidly generate multivariate PDFs on-the-fly. GALPRO is documented and available for researchers to use in their cosmology and galaxy evolution studies.