MDFusion: Multi-Dimension Semantic–Spatial Feature Fusion for LiDAR–Camera 3D Object Detection
Renzhong Qiao, Hao Yuan, Wenbo Zhang
- 发表年份
- 2025
- 引用次数
- 3
- 访问权限
- 开放获取
摘要
Accurate 3D object detection is becoming increasingly vital for the development of robust perception systems, particularly in applications such as autonomous driving vehicles and robotic systems. Many existing approaches rely on bird’s eye view (BEV) feature maps to facilitate multi-modal interaction, as BEV representations enable efficient operations. However, the inherent sparsity of LiDAR BEV features often leads to misalignment with the dense semantic information in camera images, resulting in suboptimal fusion quality and degraded detection performance, especially in complex and dynamic environments. To mitigate these issues, this paper proposes a novel multi-dimension semantic–spatial feature fusion (MDFusion) method that combines LiDAR and image features in 2D and 3D spaces. Specifically, image semantic features are extracted using the DeepLabV3 segmentation network, which captures rich contextual information and is aligned with LiDAR point cloud voxel features through a summation operation to achieve precise semantic fusion. Additionally, LiDAR BEV features are fused with downsampled image features in 2D space via concatenation and spatially adaptive dilated convolution. The mechanism dynamically adjusts to the spatial characteristics of the data, ensuring robust feature integration. Extensive experiments on the KITTI and ONCE datasets demonstrate that our method achieves competitive performance in complex scenes, significantly improving the multi-modal fusion quality and detection accuracy while maintaining computational efficiency.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002