Vision Language Model Empowered Surgical Planning
Yi-He Chen, Runsheng Yu, Xin Wang, Wensheng Wang, Ning Tan, Youzhi Zhang
- 发表年份
- 2024
- 引用次数
- 1
摘要
The integration of a flexible endoscope with a surgical manipulator is crucial in minimally invasive surgery (MIS), facilitating detailed visualization of the operative field within the patient’s body. During MIS, the remote center of motion (RCM) constraints are essential for achieving visual servoing control and ensuring accurate tracking control of the robotic endoscope. Existing work requires the exact trajectory for the tracking control and does not connect both tasks with the RCM constraints. In this paper, we exploit GPT-V to develop Vision Language Model Empowered surgical Planning (VLM-EP), which uses environmental observations and task description to finish the tracking task without the exact trajectory and connect both tasks through the exploration procedure in vivo safety range. Our simulated experiments show that our VLM-EP significantly outperforms the state-of-the-art control-based baseline. We demonstrate a practical implementation of VLM-EP in real-world scenarios, which shows that VLM-EP effectively handles the tracking control task and the visual servoing control task.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002