论文标题

使用TensorFlow在基因组数据上进行隐私的协作机器学习

Privacy-preserving collaborative machine learning on genomic data using TensorFlow

论文作者

Hong, Cheng, Huang, Zhicong, Lu, Wen-jie, Qu, Hunter, Ma, Li, Dahl, Morten, Mancuso, Jason

论文摘要

机器学习(ML)方法已广泛用于基因组研究中。但是,基因组数据通常由不同的利益相关者(例如医院,大学和医疗保健公司)持有,即使他们希望合作,他们也将数据视为敏感信息。为了解决这个问题,最近的工作提出了使用安全的多方计算(MPC)的解决方案,该解决方案以分散数据的方式训练了参与者以外的最终训练模型以外什么都不学会的方式。 我们设计和实施了几种对MPC友好的ML原始原始素,包括班级重量调整和激活函数的可行近似值。此外,我们开发了该解决方案作为对TF加密的〜\ citep {dahl2018private}的扩展,使我们能够快速尝试增强机器学习技术和加密协议的增强,同时利用TensorFlow优化的优势。我们的实施与最先进的方法相比,在IDASH2019安全基因组分析竞赛的IV赛道中赢得了第一名。

Machine learning (ML) methods have been widely used in genomic studies. However, genomic data are often held by different stakeholders (e.g. hospitals, universities, and healthcare companies) who consider the data as sensitive information, even though they desire to collaborate. To address this issue, recent works have proposed solutions using Secure Multi-party Computation (MPC), which train on the decentralized data in a way that the participants could learn nothing from each other beyond the final trained model. We design and implement several MPC-friendly ML primitives, including class weight adjustment and parallelizable approximation of activation function. In addition, we develop the solution as an extension to TF Encrypted~\citep{dahl2018private}, enabling us to quickly experiment with enhancements of both machine learning techniques and cryptographic protocols while leveraging the advantages of TensorFlow's optimizations. Our implementation compares favorably with state-of-the-art methods, winning first place in Track IV of the iDASH2019 secure genome analysis competition.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源