Home /Research /CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for Parametric Gait Adaptation in Complex Terrains

LOCOMOTION

CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for Parametric Gait Adaptation in Complex Terrains

Gershom Seneviratne, Kasun Weerakoon, Mohamed Elnoor, Vaishnav Rajgopal, Harshavarthan Varatharajan, Mohamed Khalid M Jaffar, Jason Pusey, Dinesh Manocha

Year: 2025
Citations: 2

Abstract

We present CROSS-GAiT, a novel algorithm for quadruped robots that uses Cross Attention to fuse terrain representations derived from visual and time-series inputs; including linear accelerations, angular velocities, and joint efforts. These fused representations are used to continuously adjust two critical gait parameters (step height and hip splay), enabling adaptive gaits that respond dynamically to varying terrain conditions. To generate terrain representations, we process visual inputs through a masked Vision Transformer (ViT) encoder and time-series data through a dilated causal convolutional encoder. The Cross Attention mechanism then selects and integrates the most relevant features from each modality, combining terrain characteristics with robot dynamics for informed gait adaptation. This fused representation allows CROSS-GAiT to continuously adjust gait parameters in response to unpredictable terrain conditions in real-time. We train CROSS-GAiT on a diverse set of terrains including asphalt, concrete, brick pavements, grass, dense vegetation, pebbles, gravel, and sand and validate its generalization ability on unseen environments. Our hardware implementation on the Ghost Robotics Vision 60 demonstrates superior performance in challenging terrains, such as high-density vegetation, unstable surfaces, sandbanks, and deformable substrates. We observe at least a 7.04% reduction in IMU energy density and a 27.3% reduction in total joint effort, which directly correlates with increased stability and reduced energy usage when compared to state-of-the-art methods. Furthermore, CROSS-GAiT demonstrates at least a 64.5% increase in success rate and a 4.91% reduction in time to reach the goal in four complex scenarios. Additionally, the learned representations perform 4.48% better than the state-of-the-art on a terrain classification task.

Keywords

TerrainRobotGaitRoboticsEncoderReduction (mathematics)Representation (politics)Humanoid robot

CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for Parametric Gait Adaptation in Complex Terrains

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

Self-Organizing Maps

Vision meets robotics: The KITTI dataset