可解释的HER2得分通过通过弱监督，受约束的深度学习方法评估临床准则来评估

论文标题

可解释的HER2得分通过通过弱监督，受约束的深度学习方法评估临床准则来评估

Interpretable HER2 scoring by evaluating clinical Guidelines through a weakly supervised, constrained Deep Learning Approach

论文作者

Pham, Manh Dan, Tilmant, Cyprien, Petit, Stéphanie, Salmon, Isabelle, Hadj, Saima Ben, Fick, Rutger H. J.

论文摘要

人类表皮生长因子受体-2（HER2）表达的评估是乳腺癌治疗选择的重要预后生物标志物。但是，由于中心之间的污渍变化以及在视觉上估算特定百分比肿瘤面积的染色强度，因此HER2评分众所周知，观察者间的变异性较高。在本文中，侧重于病理学家的HER2评分，我们提出了一种半自动，两阶段的深度学习方法，该方法直接评估了美国临床肿瘤学会/美国病理学家学院（ASCO/ CAP）定义的临床HER2指南。在第一阶段，我们将侵入性肿瘤分割为感兴趣的用户指示区域（ROI）。然后，在第二阶段，我们将肿瘤组织分为四个HER2类。对于分类阶段，我们使用弱监督，受约束的优化来找到对癌变斑块进行分类的模型，以使肿瘤表面百分比符合每个HER2类别的指南规范。我们通过冻结模型并以监督的方式来结束第二阶段，以训练集中的所有幻灯片标签来完善其输出逻辑。为了确保数据集标签的质量，我们进行了多病理学家HER2评分共识。为了评估未发现共识的可疑案例，我们的模型可以通过解释其HER2类别的输出来提供帮助。在测试集中，我们在F1得分中实现了0.78的性能，同时使我们的模型可解释为病理学家，希望为数字病理学中的可解释的AI模型做出贡献。

The evaluation of the Human Epidermal growth factor Receptor-2 (HER2) expression is an important prognostic biomarker for breast cancer treatment selection. However, HER2 scoring has notoriously high interobserver variability due to stain variations between centers and the need to estimate visually the staining intensity in specific percentages of tumor area. In this paper, focusing on the interpretability of HER2 scoring by a pathologist, we propose a semi-automatic, two-stage deep learning approach that directly evaluates the clinical HER2 guidelines defined by the American Society of Clinical Oncology/ College of American Pathologists (ASCO/CAP). In the first stage, we segment the invasive tumor over the user-indicated Region of Interest (ROI). Then, in the second stage, we classify the tumor tissue into four HER2 classes. For the classification stage, we use weakly supervised, constrained optimization to find a model that classifies cancerous patches such that the tumor surface percentage meets the guidelines specification of each HER2 class. We end the second stage by freezing the model and refining its output logits in a supervised way to all slide labels in the training set. To ensure the quality of our dataset's labels, we conducted a multi-pathologist HER2 scoring consensus. For the assessment of doubtful cases where no consensus was found, our model can help by interpreting its HER2 class percentages output. We achieve a performance of 0.78 in F1-score on the test set while keeping our model interpretable for the pathologist, hopefully contributing to interpretable AI models in digital pathology.

下载PDF全文

下载文献需遵守相关版权规定

论文标题