RoboNurse-VLA: Robotic Scrub Nurse System based on Vision-Language-Action Model
Shunlei Li, Jin Wang, Rui Dai, Wanyu Ma, Wing Yin Ng, Yingbai Hu, Zheng Li
- 发表年份
- 2025
- 引用次数
- 4
摘要
In modern healthcare, the demand for autonomous robotic assistants has grown significantly, particularly in the operating room, where surgical tasks require precision and reliability. Robotic scrub nurses have emerged as a promising solution to improve efficiency and reduce human error during surgery. However, challenges remain in terms of accurately grasping and handing over surgical instruments, especially when dealing with complex objects in dynamic environments. In this work, we introduce RoboNurse-VLA, a novel robotic scrub nurse system based on a Vision-Language-Action (VLA) model. RoboNurse-VLA integrates Segment Anything Model 2 (SAM 2) and Llama 2, leveraging an LLM head to enhance reasoning capabilities. By combining SAM 2’s mask generation with Llama 2’s advanced reasoning, RoboNurse-VLA can accurately interpret task requirements, identify optimal grasping points, and determine appropriate handover poses. Designed for real-time operation, RoboNurse-VLA enables precise grasping and seamless handover of surgical instruments based on voice commands from the surgeon. Utilizing state-of-the-art vision and language models, it effectively addresses challenges related to object detection, pose optimization, and handling difficult-to-grasp instruments. Extensive evaluations demonstrate that RoboNurse-VLA outperforms existing models, achieving high success rates in surgical instrument handovers, even for previously unseen tools and complex objects. This work represents a significant advancement in autonomous surgical assistance, highlighting the potential of VLA models for real-world medical applications. More details can be found at https:// robonurse-vla.github.io.
关键词
相关论文
Artificial intelligence: a modern approach
1995
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002
Self-Organizing Maps
Teuvo Kohonen
1995
Vision meets robotics: The KITTI dataset
Andreas Geiger, Philip Lenz, Christoph Stiller 等 4 位作者
2013