首页 /研究 /SurgRAW: Multi-Agent Workflow With Chain of Thought Reasoning for Robotic Surgical Video Analysis

SURGICAL

SurgRAW: Multi-Agent Workflow With Chain of Thought Reasoning for Robotic Surgical Video Analysis

Chang Han Low, Ziyue Wang, Evangelos B. Mazomenos, Yueming Jin

发表年份: 2026
引用次数: 2

摘要

Robotic-assisted surgery (RAS) is central to modern surgery, driving the need for intelligent systems with accurate scene understanding. Most existing surgical AI methods rely on isolated, task-specific models, leading to fragmented pipelines with limited interpretability and no unified understanding of RAS scene. Vision-Language Models (VLMs) offer strong zero-shot reasoning, but struggle with hallucinations, domain gaps and weak task-interdependency modeling. To address the lack of unified data for RAS scene understanding, we introduce <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SurgCoTBench</b>, the first reasoning-focused benchmark in RAS, covering 14256 QA pairs with frame-level annotations across five major surgical tasks. Building on SurgCoTBench, we propose <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SurgRAW</b>, a clinically aligned Chain-of-Thought (CoT) driven agentic workflow for zero-shot multi-task reasoning in surgery. SurgRAW employs a hierarchical reasoning workflow where an orchestrator divides surgical scene understanding into two reasoning streams and directs specialized agents to generate task-level reasoning, while higher-level agents capture workflow interdependencies or ground output clinically. Specifically, we propose a panel discussion mechanism to ensure task-specific agents collaborate synergistically and leverage on task interdependencies. Similarly, we incorporate a retrieval-augmented generation module to enrich agents with surgical knowledge and alleviate domain gaps in general VLMs. We design task-specific CoT prompts grounded in surgical domain to ensure clinically aligned reasoning, reduce hallucinations and enhance interpretability. Extensive experiments show that SurgRAW surpasses mainstream VLMs and agentic systems and outperforms a supervised model by 14.61% accuracy. Dataset and code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/jinlab-imvr/SurgRAW.git</uri>.

关键词

WorkflowInterpretabilityLeverage (statistics)Domain (mathematical analysis)Benchmark (surveying)Task (project management)Domain knowledgePipeline (software)

SurgRAW: Multi-Agent Workflow With Chain of Thought Reasoning for Robotic Surgical Video Analysis

摘要

关键词

相关论文

A new optimizer using particle swarm theory

Are we ready for autonomous driving? The KITTI vision benchmark suite

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

A benchmark for the evaluation of RGB-D SLAM systems