MambaSlip: A Novel Multimodal Large Language Model for Real-Time Robotic Slip Detection
Shaohua Zhang, Bingyi Mao, Fengda Zhao, Wenbai Chen, Guowei Gao, Peiliang Wu
- 发表年份
- 2025
- 引用次数
- 2
摘要
The current robotic sliding detection tasks lack an effective contextual reasoning mechanism, which leads to inaccurate decision-making in unknown environments. To address this issue, we propose MambaSlip, which leverages the advantages of large language models (LLMs) in context understanding and reasoning to significantly enhance the detection accuracy of sliding detection tasks. First, we replace the traditional Transformer-based LLMs with those based on a state-space model (SSM) architecture. This design achieves linear complexity with Mamba's global receptive field coverage and dynamic weighting, effectively handling long-range multimodal dependencies while maintaining fast inference speed. Second, we designed a tactile-visual encoder specifically adapted to the slip detection task for MambaSlip. We model image sequences in a one-dimensional causal manner and enhance the compatibility between the tactile-visual encoder and the LLMs through a CLIP-style pretraining method. This design not only improves temporal modeling capabilities but also enhances modality alignment. These innovations collectively enhance MambaSlip's slip detection ability, enabling it to effectively fuse tactile and visual modal information for more accurate decision-making in complex scenarios. Finally, we further validate our model through real-world robotic experiments. The experimental results demonstrate that our model exhibits outstanding performance and generalization ability in practical applications.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991