Do segmentation metrics reflect clinical reality? A surgeon-centered evaluation in robot-assisted minimally invasive esophagectomy
Ronald de Jong, Yiping Li, Romy van Jaarsveld, Gino M. Kuiper, Richard van Hillegersberg, Jelle P. Ruurda, Josien P. W. Pluim, Marcel Breeuwer, Yasmina Al Khalil
- 发表年份
- 2025
- 引用次数
- 1
- 访问权限
- 开放获取
摘要
BACKGROUND: Deep learning-based anatomy segmentation holds promise for improving real-time guidance in complex surgeries such as robot-assisted minimally invasive esophagectomy (RAMIE). However, the clinical relevance of commonly used metrics for evaluating segmentation quality remains unclear, as previous assessments have lacked direct input from surgeons. This study aims to assess how well quantitative segmentation metrics reflect surgeons' assessments of anatomical overlay accuracy and clinical usefulness during RAMIE. METHODS: We conducted a survey involving 26 upper gastrointestinal surgeons, including both trainee and attending surgeons, who assessed video clips of RAMIE procedures featuring deep learning-generated anatomical overlays. We correlated the surgeons' qualitative evaluations of annotation accuracy and clinical usefulness with a comprehensive set of quantitative metrics, including overlap, distance, temporal, and error-specific measures. The analysis encompassed over 8000 manually annotated frames from 12 video clips, with overlays generated by two state-of-the-art deep learning models. RESULTS: Overlap and temporal consistency metrics show the strongest correlation with surgeon assessments. Distance-based and error-specific metrics correlate moderately. Novices show weaker correlations and tend to rate overlays more leniently. Qualitative feedback reveals issues like hallucinations and instability, often missed by current metrics. CONCLUSION: Standard quantitative metrics partially reflect surgeon perceptions but should be complemented by surgeon-informed evaluations and task-specific metrics to better capture clinically relevant errors. Aligning metric design with surgical expertise is essential for the safe and effective integration of AI-guided anatomical segmentation in the operating room.
关键词
相关论文
3D is here: Point Cloud Library (PCL)
Radu Bogdan Rusu, Steve Cousins
2011
A guide to deep learning in healthcare
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar 等 10 位作者
2018
Simultaneous localization and mapping: part I
Hugh Durrant‐Whyte, T. Bailey
2006
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Vivienne Sze, Yu‐Hsin Chen, Tien-Ju Yang 等 4 位作者
2017