Home /Research /Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation
OTHER

Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation

Xiaodong Guo, Lingling Hu, Zhihong Deng, Tong Liu, Wujie Zhou

Year
2025
Citations
2

Abstract

The integration of RGB and thermal data can significantly improve semantic segmentation performance in wild environments for field robots. Nevertheless, multi-source data processing (e.g. Transformer-based approaches) imposes significant computational overhead, presenting challenges for resource-constrained systems. To resolve this critical limitation, we introduced CM-SSM, an efficient RGB-thermal semantic segmentation architecture leveraging a cross-modal state space modeling (SSM) approach. Our framework comprises two key components. First, we introduced a cross-modal 2D-selective-scan (CM-SS2D) module to establish SSM between RGB and thermal modalities, which constructs cross-modal visual sequences and derives hidden state representations of one modality from the other. Second, we developed a cross-modal state space association (CM-SSA) module that effectively integrates global associations from CM-SS2D with local spatial features extracted through convolutional operations. In contrast with Transformer-based approaches, CM-SSM achieves linear computational complexity with respect to image resolution. Experimental results show that CM-SSM achieves state-of-the-art performance on the CART dataset with fewer parameters and lower computational cost. Further experiments on the PST900 dataset demonstrate its generalizability. Codes are available at https://github.com/xiaodonguo/CMSSM.

Keywords

SegmentationRGB color modelComputational complexity theoryPattern recognition (psychology)Key (lock)Image segmentationConvolutional neural networkSemantics (computer science)Field (mathematics)

Related papers

Browse all OTHER papers