Home /Research /Residual Policy Optimization With Trust Region Constraints: A Learning Framework for Stable and Agile Wheel-Legged Locomotion

LOCOMOTION

Residual Policy Optimization With Trust Region Constraints: A Learning Framework for Stable and Agile Wheel-Legged Locomotion

Naifeng He, Xiaoliang Fan, W. Que, Siyang Liu, Hongyu Xu, Chunguang Bu, Bi Zhang

Year: 2025
Citations: 1

Abstract

Wheel-legged robots integrate the adaptability of legged locomotion with the efficiency of wheeled movement, enabling agile traversal across diverse terrains. However, abrupt terrain transitions introduce substantial state variations, including velocity fluctuations, posture shifts, and slippage, which pose significant challenges to locomotion stability. To address these issues, we propose a state error compensation framework that integrates a residual network with a trust-region mechanism. The residual network implicitly captures nonlinear contact dynamics, enabling real-time correction of slippage-induced state deviations, while the trust-region mechanism regulates compensation amplitude to maintain stable locomotion. Furthermore, we introduce a dual-source contrastive learning strategy, which explicitly differentiates terrain-induced transitions from external perturbations, facilitating context-aware error recovery. The proposed framework is integrated into a model-free reinforcement learning pipeline, ensuring adaptability to previously unseen environments. To further enhance robustness, an uncertainty-aware calibration module is introduced. This module dynamically adjusts the trust region boundary in real time, leveraging sensory feedback to adaptively constrain residual corrections and prevent over-adjustment, thereby maintaining stability during diverse terrain transitions. Experimental results demonstrate that the proposed framework achieves a 96.7% terrain traversal success rate and 92% velocity tracking accuracy under dynamic disturbances. On unstructured and mixed terrains, it maintains a mean velocity tracking error of 0.15 m/s and stable posture, with pitch and roll angles constrained to ±0.04 rad and ±0.02 rad, respectively.

Keywords

TerrainResidualAdaptabilityReinforcement learningTree traversalRobotAgile software developmentCompensation (psychology)Tracking error

Residual Policy Optimization With Trust Region Constraints: A Learning Framework for Stable and Agile Wheel-Legged Locomotion

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

Self-Organizing Maps

Vision meets robotics: The KITTI dataset

Probabilistic robotics