Home /Research /Do segmentation metrics reflect clinical reality? A surgeon-centered evaluation in robot-assisted minimally invasive esophagectomy
SURGICAL

Do segmentation metrics reflect clinical reality? A surgeon-centered evaluation in robot-assisted minimally invasive esophagectomy

Ronald de Jong, Yiping Li, Romy van Jaarsveld, Gino M. Kuiper, Richard van Hillegersberg, Jelle P. Ruurda, Josien P. W. Pluim, Marcel Breeuwer, Yasmina Al Khalil

Year
2025
Citations
1
Access
Open access

Abstract

BACKGROUND: Deep learning-based anatomy segmentation holds promise for improving real-time guidance in complex surgeries such as robot-assisted minimally invasive esophagectomy (RAMIE). However, the clinical relevance of commonly used metrics for evaluating segmentation quality remains unclear, as previous assessments have lacked direct input from surgeons. This study aims to assess how well quantitative segmentation metrics reflect surgeons' assessments of anatomical overlay accuracy and clinical usefulness during RAMIE. METHODS: We conducted a survey involving 26 upper gastrointestinal surgeons, including both trainee and attending surgeons, who assessed video clips of RAMIE procedures featuring deep learning-generated anatomical overlays. We correlated the surgeons' qualitative evaluations of annotation accuracy and clinical usefulness with a comprehensive set of quantitative metrics, including overlap, distance, temporal, and error-specific measures. The analysis encompassed over 8000 manually annotated frames from 12 video clips, with overlays generated by two state-of-the-art deep learning models. RESULTS: Overlap and temporal consistency metrics show the strongest correlation with surgeon assessments. Distance-based and error-specific metrics correlate moderately. Novices show weaker correlations and tend to rate overlays more leniently. Qualitative feedback reveals issues like hallucinations and instability, often missed by current metrics. CONCLUSION: Standard quantitative metrics partially reflect surgeon perceptions but should be complemented by surgeon-informed evaluations and task-specific metrics to better capture clinically relevant errors. Aligning metric design with surgical expertise is essential for the safe and effective integration of AI-guided anatomical segmentation in the operating room.

Keywords

SegmentationMetric (unit)EsophagectomyKey (lock)MEDLINEImage segmentation

Related papers

Browse all SURGICAL papers