GLNet-YOLO: Multimodal Feature Fusion for Pedestrian Detection

Yi Zhang, Qing Zhao, Yang Shen, Jinhe Ran, Shu Gui, Haiyan Zhang, Xiuhe Li, Zhen Zhang

Year: 2025
Citations: 2
Access: Open access

Abstract

In the field of modern computer vision, pedestrian detection technology holds significant importance in applications such as intelligent surveillance, autonomous driving, and robot navigation. However, single-modal images struggle to achieve high-precision detection in complex environments. To address this, this study proposes a GLNet-YOLO framework based on cross-modal deep feature fusion, aiming to improve pedestrian detection performance in complex environments by fusing feature information from visible light and infrared images. By extending the YOLOv11 architecture, the framework adopts a dual-branch network structure to process visible light and infrared modal inputs, respectively, and introduces the FM module to realize global feature fusion and enhancement, as well as the DMR module to accomplish local feature separation and interaction. Experimental results show that on the LLVIP dataset, compared to the single-modal YOLOv11 baseline, our fused model improves the mAP@50 by 9.2% over the visible-light-only model and 0.7% over the infrared-only model. This significantly improves the detection accuracy under low-light and complex background conditions and enhances the robustness of the algorithm, and its effectiveness is further verified on the KAIST dataset.

Keywords

Pedestrian detectionRobustness (evolution)Feature (linguistics)PedestrianObject detectionFeature extractionProcess (computing)Pattern recognition (psychology)

GLNet-YOLO: Multimodal Feature Fusion for Pedestrian Detection

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems