应用于Qubrics调查的概率随机森林：通过合成数据改善高红移类星体的选择

论文标题

应用于Qubrics调查的概率随机森林：通过合成数据改善高红移类星体的选择

The probabilistic random forest applied to the QUBRICS survey: improving the selection of high-redshift quasars with synthetic data

论文作者

Guarneri, Francesco, Calderone, Giorgio, Cristiani, Stefano, Porru, Matteo, Fontanot, Fabio, Boutsia, Konstantina, Cupani, Guido, Grazian, Andrea, D'Odorico, Valentina, Murphy, Michael T., Bongiorno, Angela, Saccheo, Ivano, Nicastro, Luciano

论文摘要

最近的几部作品集中在南部寻找明亮的高Z数量（QSO）上。其中，在南半球（Qubrics）调查中，类星体作为宇宙学的明亮信标（Qubrics）现在已经提供了数百种通过机器学习算法选择的新光谱确认的QSO。在通过引入概率随机森林（PRF）进行Qubrics选择获得的结果的基础上，我们在这项工作中探索了培训合成数据算法以提高较高红移箱的完整性的可行性。如果将颜色用作主要特征而不是大小，我们还比较算法的性能。我们基于复合QSO光谱能分布生成合成数据。我们首先训练PRF以识别恒星和星系之间的QSO，然后将High-Z Quasar与低Z污染物分开。我们将算法应用于基于Skymapper DR3的更新数据集上，并结合Gaia Edr3、2 Mass和Wise幅度。我们发现，采用颜色作为特征略有改善了对幅度数据训练的算法的结果。在训练集中添加合成数据可为仅在光谱确认的QSO上训练的PRF提供明显更好的结果。我们在测试数据集上估计，完整性约为86％，污染约为36％。最后，观察到207名PRF选择的候选者：149/207被视为真正的QSO，Z> 2.5，41，Z <2.5，3 3.5，3个星系和14星。结果证实了PRF在大型数据集中选择高Z数量的能力。

Several recent works have focused on the search for bright, high-z quasars (QSOs) in the South. Among them, the QUasars as BRIght beacons for Cosmology in the Southern hemisphere (QUBRICS) survey has now delivered hundreds of new spectroscopically confirmed QSOs selected by means of machine learning algorithms. Building upon the results obtained by introducing the probabilistic random forest (PRF) for the QUBRICS selection, we explore in this work the feasibility of training the algorithm on synthetic data to improve the completeness in the higher redshift bins. We also compare the performances of the algorithm if colours are used as primary features instead of magnitudes. We generate synthetic data based on a composite QSO spectral energy distribution. We first train the PRF to identify QSOs among stars and galaxies, then separate high-z quasar from low-z contaminants. We apply the algorithm on an updated dataset, based on SkyMapper DR3, combined with Gaia eDR3, 2MASS and WISE magnitudes. We find that employing colours as features slightly improves the results with respect to the algorithm trained on magnitude data. Adding synthetic data to the training set provides significantly better results with respect to the PRF trained only on spectroscopically confirmed QSOs. We estimate, on a testing dataset, a completeness of ~86% and a contamination of ~36%. Finally, 207 PRF-selected candidates were observed: 149/207 turned out to be genuine QSOs with z > 2.5, 41 with z < 2.5, 3 galaxies and 14 stars. The result confirms the ability of the PRF to select high-z quasars in large datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题