Stereo Visual SLAM Using SuperPoint and SuperGlue Feature Detection, Tracking and Matching
Siwon Yoon, Soon-Yong Park
- Year
- 2025
- Citations
- 1
Abstract
This paper presents a novel stereo visual simultaneous localization and mapping (SLAM) method utilizing SuperPoint and SuperGlue deep features. In the study of visual odometry and SLAM, conventional handcrafted feature extraction and tracking techniques are still employed in many real-time and domain-independent applications. As a representative conventional visual odometry method, VINS-Fusion tightly couples visual and inertial information to solve the pose estimation problem for a fast aerial robot such as a drone. However, VINS-Fusion often suffers from inaccuracies in 3D pose and translation scale estimation, which are mainly caused by failures in feature tracking and stereo mismatching. To mitigate these problems, we propose replacing the conventional feature extraction in VINS-Fusion with the SuperPoint deep feature extraction network, which results in notable improvements in quantitative evaluations. Additionally, SuperGlue is employed for feature tracking and stereo matching, ensuring the more accurate projection of 3D map points onto pixel coordinates across multiple images. In contrast to conventional optical flow algorithms, SuperGlue, which is an attention-based graph neural network, enhances both feature tracking and stereo matching performance. The proposed method was evaluated using two popular datasets, namely EuRoC MAV and KITTI Odometry, achieving root-mean-squared error reductions of approximately 28% to 69% for loop-closed visual odometry. Additionally, qualitative evaluations in indoor parking spaces further demonstrate the improved performance of the proposed approach. By using the proposed approach, it is possible to achieve relatively accurate and robust pose estimation, even in situations where the visual data contain rapid or unstable camera motions and poor lighting conditions.
Keywords
Related papers
Artificial intelligence: a modern approach
1995
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, P Lenz, R. Urtasun
2012
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martı́n Abadi, Ashish Agarwal, Paul Barham +17 more
2016
Vision meets robotics: The KITTI dataset
Andreas Geiger, Philip Lenz, Christoph Stiller +1 more
2013