化学变压器压缩以加速分子建模的训练和推理

论文标题

化学变压器压缩以加速分子建模的训练和推理

Chemical transformer compression for accelerating both training and inference of molecular modeling

论文作者

Yu, Yi, Borjesson, Karl

论文摘要

在分子科学中已经开发了变压器模型，其应用在包括定量结构 - 活性关系（QSAR）和虚拟筛选（VS）的应用中。但是，与其他类型的模型相比，它们很大，这导致了高硬件的高度要求，以减少培训和推理过程的时间。在这项工作中，使用跨层参数共享（CLP）和知识蒸馏（KD）来减少分子科学中变压器的大小。与原始BERT模型相比，这两种方法不仅具有竞争性的QSAR预测性能，而且更有效。此外，通过将CLP和KD整合到两个态化学网络中，我们引入了一种新的Deep Lite化学变压器模型，即精致。精致的捕获通用域以及特定于任务的知识，由于参数和层的数量分别减少了10倍和3倍的训练和推理速度4倍。同时，它在QSAR和VS建模中实现了可比的性能。此外，我们预计模型压缩策略为有机药物和材料设计创建有效的生成变压器模型提供了途径。

Transformer models have been developed in molecular science with excellent performance in applications including quantitative structure-activity relationship (QSAR) and virtual screening (VS). Compared with other types of models, however, they are large, which results in a high hardware requirement to abridge time for both training and inference processes. In this work, cross-layer parameter sharing (CLPS), and knowledge distillation (KD) are used to reduce the sizes of transformers in molecular science. Both methods not only have competitive QSAR predictive performance as compared to the original BERT model, but also are more parameter efficient. Furthermore, by integrating CLPS and KD into a two-state chemical network, we introduce a new deep lite chemical transformer model, DeLiCaTe. DeLiCaTe captures general-domains as well as task-specific knowledge, which lead to a 4x faster rate of both training and inference due to a 10- and 3-times reduction of the number of parameters and layers, respectively. Meanwhile, it achieves comparable performance in QSAR and VS modeling. Moreover, we anticipate that the model compression strategy provides a pathway to the creation of effective generative transformer models for organic drug and material design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题