Home /Research /Stereo Visual SLAM Using SuperPoint and SuperGlue Feature Detection, Tracking and Matching

PERCEPTION

Stereo Visual SLAM Using SuperPoint and SuperGlue Feature Detection, Tracking and Matching

Siwon Yoon, Soon-Yong Park

Year: 2025
Citations: 1

Abstract

This paper presents a novel stereo visual simultaneous localization and mapping (SLAM) method utilizing SuperPoint and SuperGlue deep features. In the study of visual odometry and SLAM, conventional handcrafted feature extraction and tracking techniques are still employed in many real-time and domain-independent applications. As a representative conventional visual odometry method, VINS-Fusion tightly couples visual and inertial information to solve the pose estimation problem for a fast aerial robot such as a drone. However, VINS-Fusion often suffers from inaccuracies in 3D pose and translation scale estimation, which are mainly caused by failures in feature tracking and stereo mismatching. To mitigate these problems, we propose replacing the conventional feature extraction in VINS-Fusion with the SuperPoint deep feature extraction network, which results in notable improvements in quantitative evaluations. Additionally, SuperGlue is employed for feature tracking and stereo matching, ensuring the more accurate projection of 3D map points onto pixel coordinates across multiple images. In contrast to conventional optical flow algorithms, SuperGlue, which is an attention-based graph neural network, enhances both feature tracking and stereo matching performance. The proposed method was evaluated using two popular datasets, namely EuRoC MAV and KITTI Odometry, achieving root-mean-squared error reductions of approximately 28% to 69% for loop-closed visual odometry. Additionally, qualitative evaluations in indoor parking spaces further demonstrate the improved performance of the proposed approach. By using the proposed approach, it is possible to achieve relatively accurate and robust pose estimation, even in situations where the visual data contain rapid or unstable camera motions and poor lighting conditions.

Keywords

Visual odometryFeature (linguistics)Feature extractionPoseSimultaneous localization and mappingOdometryTracking (education)Matching (statistics)StereopsisPixel

Stereo Visual SLAM Using SuperPoint and SuperGlue Feature Detection, Tracking and Matching

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Vision meets robotics: The KITTI dataset