Home /Research /MambaSlip: A Novel Multimodal Large Language Model for Real-Time Robotic Slip Detection
OTHER

MambaSlip: A Novel Multimodal Large Language Model for Real-Time Robotic Slip Detection

Shaohua Zhang, Bingyi Mao, Fengda Zhao, Wenbai Chen, Guowei Gao, Peiliang Wu

Year
2025
Citations
2

Abstract

The current robotic sliding detection tasks lack an effective contextual reasoning mechanism, which leads to inaccurate decision-making in unknown environments. To address this issue, we propose MambaSlip, which leverages the advantages of large language models (LLMs) in context understanding and reasoning to significantly enhance the detection accuracy of sliding detection tasks. First, we replace the traditional Transformer-based LLMs with those based on a state-space model (SSM) architecture. This design achieves linear complexity with Mamba's global receptive field coverage and dynamic weighting, effectively handling long-range multimodal dependencies while maintaining fast inference speed. Second, we designed a tactile-visual encoder specifically adapted to the slip detection task for MambaSlip. We model image sequences in a one-dimensional causal manner and enhance the compatibility between the tactile-visual encoder and the LLMs through a CLIP-style pretraining method. This design not only improves temporal modeling capabilities but also enhances modality alignment. These innovations collectively enhance MambaSlip's slip detection ability, enabling it to effectively fuse tactile and visual modal information for more accurate decision-making in complex scenarios. Finally, we further validate our model through real-world robotic experiments. The experimental results demonstrate that our model exhibits outstanding performance and generalization ability in practical applications.

Keywords

Slip (aerodynamics)Computer scienceArtificial intelligenceEngineeringAerospace engineering

Related papers

Browse all OTHER papers