迈向整体手术场景

论文标题

迈向整体手术场景

Towards Holistic Surgical Scene Understanding

论文作者

Valderrama, Natalia, Puentes, Paola Ruiz, Hernández, Isabela, Ayobi, Nicolás, Verlyk, Mathilde, Santander, Jessica, Caicedo, Juan, Fernández, Nicolás, Arbeláez, Pablo

论文摘要

研究手术干预措施的大多数基准都集中在特定的挑战上，而不是利用不同任务之间的内在互补性。在这项工作中，我们提出了一个新的实验框架，以实现整体手术场景的理解。首先，我们介绍阶段，步骤，仪器和原子视觉动作识别（PSI-AVA）数据集。 PSI-AVA包括用于机器人辅助的根治性前列腺切除术视频中的长期（相和步骤识别）和短期推理（仪器检测和新的原子作用识别）的注释。其次，我们提出了用于动作，阶段，仪器和步骤识别（TAPIR）的变压器，作为手术场景理解的强大基准。 Tapir利用我们数据集的多级注释，因为它受益于在仪器检测任务上学习的表示，以提高其分类能力。我们在PSI-AVA和其他公开可用数据库中的实验结果证明了我们框架的适当性，以刺激对整体手术现场理解的未来研究。

Most benchmarks for studying surgical interventions focus on a specific challenge instead of leveraging the intrinsic complementarity among different tasks. In this work, we present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) Dataset. PSI-AVA includes annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding. TAPIR leverages our dataset's multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Our experimental results in both PSI-AVA and other publicly available databases demonstrate the adequacy of our framework to spur future research on holistic surgical scene understanding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题