Home /Research /Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation

PERCEPTION

Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation

Delun Lai, Yeyubei Zhang, Yunchong Liu, Chaojie Li, Huadong Mo

Year: 2025
Access: Open access

Abstract

This paper introduces a novel deep learning-based multimodal fusion architecture aimed at enhancing the perception capabilities of autonomous navigation robots in complex environments. By utilizing innovative feature extraction modules, adaptive fusion strategies, and time-series modeling mechanisms, the system effectively integrates RGB images and LiDAR data. The key contributions of this work are as follows: a. the design of a lightweight feature extraction network to enhance feature representation; b. the development of an adaptive weighted cross-modal fusion strategy to improve system robustness; and c. the incorporation of time-series information modeling to boost dynamic scene perception accuracy. Experimental results on the KITTI dataset demonstrate that the proposed approach increases navigation and positioning accuracy by 3.5% and 2.2%, respectively, while maintaining real-time performance. This work provides a novel solution for autonomous robot navigation in complex environments.

Keywords

cs.LGcs.CVcs.RO

Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Vision meets robotics: The KITTI dataset