Home /Research /LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba

LOCOMOTION

LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba

Yinuo Wang, Gavin Tao

Year: 2025
Access: Open access

Abstract

We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact state-centric reward that balances progress, smoothness, and safety. We evaluate our method in challenging simulated environments with static and moving obstacles as well as uneven terrain. Compared with state-of-the-art baselines, our method achieves higher returns and success rates with fewer collisions, exhibits stronger generalization to unseen terrains and obstacle densities, and improves training efficiency by converging in fewer updates under the same compute budget.

Keywords

cs.ROcs.AIcs.CVeess.IVeess.SY

LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba

Abstract

Keywords

Related papers

Trust Region Policy Optimization

Legged Robots That Balance

Being there: putting brain, body, and world together again

Small-scale soft-bodied robot with multimodal locomotion