论文标题
molscribe:具有图像对图的稳健分子结构识别
MolScribe: Robust Molecular Structure Recognition with Image-To-Graph Generation
论文作者
论文摘要
分子结构识别是将分子图像转换为其图结构的任务。化学文献中表现出的绘画样式和惯例的显着差异对自动执行此任务构成了重大挑战。在本文中,我们提出了一种新型的图像到图形生成模型,该模型明确预测原子和键以及它们的几何布局,以构建分子结构。我们的模型灵活地纳入了符号化学限制,以识别手性和扩大缩写结构。我们进一步制定了数据增强策略,以增强针对领域变化的模型鲁棒性。在合成和逼真的分子图像的实验中,Molscorce明显优于先前的模型,在公共基准上达到了76-93%的精度。化学家还可以轻松地验证Molscribe的预测,这是由于其置信度估计和与输入图像的原子级比对所致。 Molscribe可通过Python和Web界面公开获得:https://github.com/thomas0809/molscribe。
Molecular structure recognition is the task of translating a molecular image into its graph structure. Significant variation in drawing styles and conventions exhibited in chemical literature poses a significant challenge for automating this task. In this paper, we propose MolScribe, a novel image-to-graph generation model that explicitly predicts atoms and bonds, along with their geometric layouts, to construct the molecular structure. Our model flexibly incorporates symbolic chemistry constraints to recognize chirality and expand abbreviated structures. We further develop data augmentation strategies to enhance the model robustness against domain shifts. In experiments on both synthetic and realistic molecular images, MolScribe significantly outperforms previous models, achieving 76-93% accuracy on public benchmarks. Chemists can also easily verify MolScribe's prediction, informed by its confidence estimation and atom-level alignment with the input image. MolScribe is publicly available through Python and web interfaces: https://github.com/thomas0809/MolScribe.