Home /Research /ARFF-VO: A Self-Supervised Monocular Visual Odometry With Adaptive Region-Based Feature Filtering in Dynamic Scenes
PERCEPTION

ARFF-VO: A Self-Supervised Monocular Visual Odometry With Adaptive Region-Based Feature Filtering in Dynamic Scenes

Guangdong Tong, Yuanyang Zhang, Zheng Li, Jiaru Sun

Year
2025
Citations
2

Abstract

Self-supervised monocular visual odometry provides a solution for robot localization and mapping without the need for labeled data by minimizing image reconstruction loss to train the network. However, existing methods explicitly remove dynamic objects by introducing semantic masks, which limits their adaptability to dynamic pixels. In this paper, we propose ARFF-VO, which integrates dynamic removal strategy into the network to enable the model to self-adaptively suppress redundant information. To fully exploit non-redundant features, we introduce Region-Structure Perception (R-SP) module that utilizes high-level semantic information to construct perception features and confidence scores. Additionally, we employ the Vim block, with selective state space models as its core operator, to build the pose decoder. The model effectively compresses contextual information to enhance long-sequence modeling capability. Furthermore, since monocular depth estimation and pose prediction are simultaneously trained, the performance improvement of visual odometry also positively impacts depth estimation. Evaluations on the KITTI dataset demonstrate that our method achieves superior performance compared to state-of-the-art self-supervised methods.

Keywords

Visual odometryMonocularOdometryFeature (linguistics)RobotFeature extractionVisualizationPoseSemantics (computer science)

Related papers

Browse all PERCEPTION papers