Enhancing Visual Reasoning With LLM-Powered Knowledge Graphs for Visual Question Localized-Answering in Robotic Surgery
Hongqiu Wang, Guang Yang, Lei Zhu
- 发表年份
- 2025
- 引用次数
- 6
摘要
Expert surgeons often have heavy workloads and cannot promptly respond to queries from medical students and junior doctors about surgical procedures. Thus, research on Visual Question Localized-Answering in Surgery (Surgical-VQLA) is essential to assist medical students and junior doctors in understanding surgical scenarios. Surgical-VQLA aims to generate accurate answers and locate relevant areas in the surgical scene, requiring models to identify and understand surgical instruments, operative organs, and procedures. A key issue is the model's ability to accurately distinguish surgical instruments. Current Surgical-VQLA models rely primarily on sparse textual information, limiting their visual reasoning capabilities. To address this issue, we propose a framework called Enhancing Visual Reasoning with LLM-Powered Knowledge Graphs (EnVR-LPKG) for the Surgical-VQLA task. This framework enhances the model's understanding of the surgical scenario by utilizing knowledge graphs of surgical instruments constructed by the Large Language Model (LLM). Specifically, we design a Fine-grained Knowledge Extractor (FKE) to extract the most relevant information from knowledge graphs and perform contrastive learning with the extracted knowledge graphs and local image. Furthermore, we design a Multi-attention-based Surgical Instrument Enhancer (MSIE) module, which employs knowledge graphs to obtain an enhanced representation of the corresponding surgical instrument in the global scene. Through the MSIE module, the model can learn how to fuse visual features with knowledge graph text features, thereby strengthening the understanding of surgical instruments and further improving visual reasoning capabilities. Extensive experimental results on the EndoVis-17-VQLA and EndoVis-18-VQLA datasets demonstrate that our proposed method outperforms other state-of-the-art methods. We will release our code for future research.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002