粒子物理中协作学习的首次应用

论文标题

粒子物理中协作学习的首次应用

A First Application of Collaborative Learning In Particle Physics

论文作者

Vergani, Stefano, Bagoly, Attila

论文摘要

在过去的十年中，机器学习的普及（ML）在包括粒子物理学在内的所有科学领域都成倍增长。该行业还开发了新的强大工具，该工具以学术界进口，可以彻底改变研究。粒子物理社区尚未注意的一个最近的行业发展是协作学习（CL），该框架允许使用不同数据集训练相同的ML模型。这项工作探讨了CL的潜力，并通过中微子物理学模拟测试了图书馆。由英国剑桥公司Fetch.ai开发的Colearn可以实现分散的机器学习任务。作为区块链介导的CL系统，它允许多个利益相关者构建共享的ML模型，而无需依靠中央权威。已经模拟了通用的液体氩时间预测室（LARTPC），并使用虚构的中微子相互作用产生的图像已被用于生成多个数据集。这些称为学习者的数据集成功地参与了使用区块链技术以分散的方式培训深度学习（DL）KERAS模型。该测试探讨了使用来自不同研究组的不同仿真数据集训练单个ML模型的可行性。在这项工作中，我们还讨论了一个框架，该框架使不同的ML模型在同一数据集上相互竞争。然后，最终目标是使用所有可用数据集或选择在社区中开发的每个模型中使用最佳的模型来培训整个科学界最大的ML模型。

Over the last ten years, the popularity of Machine Learning (ML) has grown exponentially in all scientific fields, including particle physics. The industry has also developed new powerful tools that, imported into academia, could revolutionise research. One recent industry development that has not yet come to the attention of the particle physics community is Collaborative Learning (CL), a framework that allows training the same ML model with different datasets. This work explores the potential of CL, testing the library Colearn with neutrino physics simulation. Colearn, developed by the British Cambridge-based firm Fetch.AI, enables decentralised machine learning tasks. Being a blockchain-mediated CL system, it allows multiple stakeholders to build a shared ML model without needing to rely on a central authority. A generic Liquid Argon Time-Projection Chamber (LArTPC) has been simulated and images produced by fictitious neutrino interactions have been used to produce several datasets. These datasets, called learners, participated successfully in training a Deep Learning (DL) Keras model using blockchain technologies in a decentralised way. This test explores the feasibility of training a single ML model using different simulation datasets coming from different research groups. In this work, we also discuss a framework that instead makes different ML models compete against each other on the same dataset. The final goal is then to train the most performant ML model across the entire scientific community for a given experiment, either using all of the datasets available or selecting the model which performs best among every model developed in the community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题