M2NET：用于脑肿瘤患者总生存时间预测的多模式多渠道网络

论文标题

M2NET：用于脑肿瘤患者总生存时间预测的多模式多渠道网络

M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients

论文作者

Zhou, Tao, Fu, Huazhu, Zhang, Yu, Zhang, Changqing, Lu, Xiankai, Shen, Jianbing, Shao, Ling

论文摘要

对总体生存时间（OS）时间的早期和准确预测可以帮助获得更好的脑肿瘤患者治疗计划。尽管已经开发了许多操作系统时间预测方法并获得了有希望的结果，但仍然存在几个问题。首先，传统的预测方法依赖于磁共振（MR）体积的局部病变区域的放射线特征，这可能不代表完整的图像或模型复杂肿瘤模式。其次，不同类型的扫描仪（即多模式数据）对不同的大脑区域敏感，这使得有效利用多种方式互补信息并保留特定于模态的属性使其具有挑战性。第三，现有方法集中在预测模型上，忽略了复杂的数据与标签关系。为了解决上述问题，我们提出了一个端到端的OS时间预测模型；即，多模式多渠道网络（M2NET）。具体而言，我们首先将3D MR量投影到不同方向上的2D图像上，从而降低了计算成本，同时保留重要信息并启用可以从其他任务转移的预训练模型。然后，我们使用特定于模式的网络从不同的MR扫描中提取隐式和高级特征。构建了一个多模式共享网络，以使用双线性合并模型融合这些功能，从而利用其相关性以提供互补信息。最后，我们集成了每个模式特定网络和多模式共享网络的输出，以生成最终的预测结果。实验结果证明了我们的M2NET模型比其他方法的优越性。

Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not represent the full image or model complex tumor patterns. Second, different types of scanners (i.e., multi-modal data) are sensitive to different brain regions, which makes it challenging to effectively exploit the complementary information across multiple modalities and also preserve the modality-specific properties. Third, existing methods focus on prediction models, ignoring complex data-to-label relationships. To address the above issues, we propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net). Specifically, we first project the 3D MR volume onto 2D images in different directions, which reduces computational costs, while preserving important information and enabling pre-trained models to be transferred from other tasks. Then, we use a modality-specific network to extract implicit and high-level features from different MR scans. A multi-modal shared network is built to fuse these features using a bilinear pooling model, exploiting their correlations to provide complementary information. Finally, we integrate the outputs from each modality-specific network and the multi-modal shared network to generate the final prediction result. Experimental results demonstrate the superiority of our M2Net model over other methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题