论文标题
Semiretro:半女性框架提高深度逆合合成预测
SemiRetro: Semi-template framework boosts deep retrosynthesis prediction
论文作者
论文摘要
最近,基于模板的(TB)和无模板(TF)分子图学习方法显示出有希望的逆合合成结果。使用预编码的反应模板,TB方法是更准确的,并且通过将反折叠合成分解为子问题,即中心识别和合成完成,TF方法更可扩展。为了结合结核和TF的优势,我们建议将全板分解为几个半播种,然后将它们嵌入到两步的TF框架中。由于许多半模拟物具有重复性,因此可以降低模板冗余,而基本化学知识仍可以保留以促进合成的完成。我们称我们的方法为semiretro,引入了一个新的GNN层(DRGAT)来增强中心识别,并提出了一种新型的自我校正模块,以改善半板块分类。实验结果表明,Semiretro显着胜过现有的TB和TF方法。在可伸缩性中,SemireTro使用150个半模具覆盖98.9 \%数据,而以前的基于模板的GLN需要11,647个模板才能覆盖93.3 \%的数据。在TOP-1的精度中,Semiretro超过了无模板的G2G 4.8 \%(已知类)和6.0 \%(类未知)。此外,Semiretro比现有方法具有更好的训练效率。
Recently, template-based (TB) and template-free (TF) molecule graph learning methods have shown promising results to retrosynthesis. TB methods are more accurate using pre-encoded reaction templates, and TF methods are more scalable by decomposing retrosynthesis into subproblems, i.e., center identification and synthon completion. To combine both advantages of TB and TF, we suggest breaking a full-template into several semi-templates and embedding them into the two-step TF framework. Since many semi-templates are reduplicative, the template redundancy can be reduced while the essential chemical knowledge is still preserved to facilitate synthon completion. We call our method SemiRetro, introduce a new GNN layer (DRGAT) to enhance center identification, and propose a novel self-correcting module to improve semi-template classification. Experimental results show that SemiRetro significantly outperforms both existing TB and TF methods. In scalability, SemiRetro covers 98.9\% data using 150 semi-templates, while previous template-based GLN requires 11,647 templates to cover 93.3\% data. In top-1 accuracy, SemiRetro exceeds template-free G2G 4.8\% (class known) and 6.0\% (class unknown). Besides, SemiRetro has better training efficiency than existing methods.