Temporal Pyramid Alignment and Adaptive Fusion of Event Stream and Image Frame for Keypoint Detection and Tracking in Autonomous Driving
P. Shi, Chee‐Onn Chow, Wei Ru Wong
- Year
- 2025
- Citations
- 2
Abstract
This paper proposes a method to address the alignment and fusion challenges in multimodal fusion between event and RGB cameras. For multimodal alignment, we adopt the Temporal Pyramid Alignment mechanism to achieve multi-scale temporal synchronization of event streams and RGB frames. For multimodal fusion, we design a module that employs adaptive fusion to dynamically adjust the contribution of each modality based on scene complexity and feature quality. A gating network computes fusion weights by considering both relative modality importance and noise characteristics. A Cross-Modal Feature Compensation module is integrated into the framework to enhance information utilization. Additionally, the framework incorporates a Dynamic Inference Path Selection mechanism, guided by input complexity, to optimize computational resource allocation, along with a dynamic noise suppression mechanism to improve the robustness of feature extraction. Experimental results on the DSEC dataset demonstrate that the proposed method achieves a 36.9% mAP and 40.1% tracking success rate, particularly effective in extreme lighting and fast motion scenarios, surpassing existing approaches by 1.8% mAP and 1.6% SR, while maintaining real-time efficiency at 13.1 FPS. This work provides an important solution for applications in autonomous driving, robotics, and augmented reality, where robust multimodal perception under dynamic conditions is critical.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991