分层变压器用于使用多模式的生存预测整个幻灯片图像和基因组学

论文标题

分层变压器用于使用多模式的生存预测整个幻灯片图像和基因组学

Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics

论文作者

Li, Chunyuan, Zhu, Xinliang, Yao, Jiawen, Huang, Junzhou

论文摘要

学习对下游任务的GIGA像素水平整体幻灯片病理图像（WSI）的良好表示至关重要。先前的研究采用多个实例学习（MIL）代表WSI作为采样贴片的袋子，因为在大多数情况下，只有幻灯片级标签可用，只有WSI的一个很小的区域是疾病阳性的区域。但是，由于：（1）在较高分辨率上进行的贴片采样可能无法描述微环境信息，例如肿瘤细胞和周围组织之间的相对位置，而较低分辨率的贴片会失去细粒度的细节；（2）从巨型WSI中提取斑块会导致较大的袋子尺寸，从而大大增加了计算成本。为了解决问题，本文提出了一个基于层次的多模式变压器框架，该框架学习了病理图像和相应基因之间的分层映射。确切地说，我们从WSI中随机提取具有不同放大倍率的即时级斑块特征。然后学会了成像和基因组学之间的共发图映射，以发现成对相互作用并降低成像特征的空间复杂性。这种早期融合使使用MIL Transformer进行生存预测任务的计算可行。与基准方法相比，我们的架构需要更少的GPU资源，同时保持更好的WSI表示能力。我们从癌症基因组图集数据库中评估了五种癌症类型的方法，并获得了0.673美元的平均C指数，表现优于最先进的多模式方法。

Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. Previous studies employ multiple instance learning (MIL) to represent WSIs as bags of sampled patches because, for most occasions, only slide-level labels are available, and only a tiny region of the WSI is disease-positive area. However, WSI representation learning still remains an open problem due to: (1) patch sampling on a higher resolution may be incapable of depicting microenvironment information such as the relative position between the tumor cells and surrounding tissues, while patches at lower resolution lose the fine-grained detail; (2) extracting patches from giant WSI results in large bag size, which tremendously increases the computational cost. To solve the problems, this paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes. Precisely, we randomly extract instant-level patch features from WSIs with different magnification. Then a co-attention mapping between imaging and genomics is learned to uncover the pairwise interaction and reduce the space complexity of imaging features. Such early fusion makes it computationally feasible to use MIL Transformer for the survival prediction task. Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability. We evaluate our approach on five cancer types from the Cancer Genome Atlas database and achieved an average c-index of $0.673$, outperforming the state-of-the-art multimodality methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题