首页 /研究 /PLDKD-Net: Pixel-Level Discriminative Knowledge Distillation for Surgical Scene Segmentation With Graph-Based Visual Parsing
SURGICAL

PLDKD-Net: Pixel-Level Discriminative Knowledge Distillation for Surgical Scene Segmentation With Graph-Based Visual Parsing

Bo Lu, X. L. Zheng, Zhenjie Zhu, Ziyi Wang, Bruce X. B. Yu, Mingchuan Zhou, Peng Qi, Huicong Liu, Yunhui Liu, Lining Sun

发表年份
2025
引用次数
2

摘要

Efficient laparoscopic scene segmentation holds significant potential for surgical assistive intelligence and image-guided task autonomy in robotic surgery. However, the abdominal cavity with intricate tissues and surgical tools under varying conditions challenges the balance between segmentation accuracy and efficiency. To resolve this problem, we propose PLDKD-Net, a novel pixel-level student-teacher knowledge distillation (KD) framework, in which the student model selectively distills the teacher’s profound knowledge while exploring rich visual features with a graph-based fusion mechanism for efficient segmentation. Specifically, we first introduce our confidence-based KD (Confi-KD) scheme, in which a pixel-level confidence generator (PCG) is proposed to assess the teacher’s performance by discriminatively evaluating its probability map and the raw image, generating a confidence map that can facilitate a selective KD for the student model. To balance the model’s accuracy and efficiency, we devise a novel heterogeneous student architecture with a bi-stream visual parsing pipeline to capture multi-scale and inter-spatial visual features. These features are then fused using a relational graph convolutional network (RGCN), which can adaptively tune the fusion degrees of multi-latent knowledge, ensuring visual parsing completeness while avoiding computational redundancy. We extensively validate PLDKD-Net on two public laparoscopic benchmarks, Endovis18 and CholecSeg8K, and in-house surgical videos. Benefiting from our schemes, the experimental outcomes demonstrate superior quantitative and qualitative performance compared to state-of-the-art methods. With the selective KD mechanism, our model yields competitive or even higher performance than the cumbersome teacher model while exhibiting quasi-real-time efficiency, which demonstrates its greater potential for intelligent robotic surgical scene understanding.

关键词

ParsingDiscriminative modelArtificial intelligenceComputer scienceSegmentationPixelImage segmentationGraphComputer visionDistillation

相关论文

查看 SURGICAL 分类全部论文