Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos
Zhimin Shao, Jialang Xu, Danail Stoyanov, Evangelos B. Mazomenos, Yueming Jin
- 发表年份
- 2024
- 访问权限
- 开放获取
摘要
Despite significant advancements in robotic systems and surgical data science, ensuring safe and optimal execution in robot-assisted minimally invasive surgery (RMIS) remains a complex challenge. Current surgical error detection methods involve two parts: identifying surgical gestures and then detecting errors within each gesture clip. These methods seldom consider the rich contextual and semantic information inherent in surgical videos, limiting their performance due to reliance on accurate gesture identification. Motivated by the chain-of-thought prompting in natural language processing, this letter presents a novel and real-time end-to-end error detection framework, Chain-of-Thought (COG) prompting, leveraging contextual information from surgical videos. This encompasses two reasoning modules designed to mimic the decision-making processes of expert surgeons. Concretely, we first design a Gestural-Visual Reasoning module, which utilizes transformer and attention architectures for gesture prompting, while the second, a Multi-Scale Temporal Reasoning module, employs a multi-stage temporal convolutional network with both slow and fast paths for temporal information extraction. We extensively validate our method on the public benchmark RMIS dataset JIGSAWS. Our method encapsulates the reasoning processes inherent to surgical activities enabling it to outperform the state-of-the-art by 4.6% in F1 score, 4.6% in Accuracy, and 5.9% in Jaccard index while processing each frame in 6.69 milliseconds on average, demonstrating the great potential of our approach in enhancing the safety and efficacy of RMIS procedures and surgical education. The code will be available.
关键词
相关论文
机器人技术在整形外科中的应用
Vijay Kumar, Sandhya Pandey
Clinical Journal of Plastic & Reconstructive Surgery · 2026
SurfSurg6D:面向无纹理手术器械的几何一致密集对应位姿估计
Daiyun Shen, Shuojue Yang, Chang Han Low 等 7 位作者
2026
EndoGSim:基于MLLM引导的高斯泼溅的物理感知4D动态内窥镜场景模拟
Changjing Liu, Yiming Huang, Long Bai 等 5 位作者
2026
腹膜后机器人辅助肾输尿管切除术:技术描述与单中心经验
Kawashima A, Ishizuya Y, Yamamoto Y 等 12 位作者
Asian journal of endoscopic surgery · 2026