Desmoking of the Endoscopic Surgery Images Based on a Local-Global U-Shaped Transformer Model
Wanqing Wang, Fucheng Liu, Jianxiong Hao, Xiangyang Yu, Bo Zhang, Chaoyang Shi
- Year
- 2025
- Citations
- 1
Abstract
In robot-assisted minimally invasive surgery (RMIS), the smoke generated by energy-based surgical instruments blurs and obstructs the endoscopic surgical field, which increases the difficulty and risk of robotic surgery. However, current desmoking research primarily focuses on natural weather conditions, with limited studies addressing desmoking techniques for endoscopic images. Furthermore, surgical smoke presents a notably intricate morphology, and research efforts aimed at uniform, non-uniform, thin, and dense smoke remain relatively limited. This work proposes a Local-Global U-Shaped Transformer Model (LGUformer) based on the U-Net and Transformer architectures to remove complex smoke from endoscopic images. By introducing a local-global multi-head self-attention mechanism and multi-scale depthwise convolution, the proposed model enhances the inference capability. An enhanced feature map fusion method improves the quality of reconstructed images. The improved modules enable efficient handling of variable smoke while generating superior-quality images. Through desmoking experiments on synthetic and real smoke images, the LGUformer model demonstrated superior performance compared with seven other desmoking models in terms of accuracy, clarity, absence of distortion, and robustness. A task-based surgical instrument segmentation experiment indicated the potential of this model as a pre-processing step in visual tasks. Finally, an ablation study was conducted to verify the advantages of the proposed modules.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002