Bridging the Gap: Enabling Soft Actor Critic for High Performance Legged Locomotion
Gianluca Sabatini, Chenhao Li, Marco Hutter
2026
Abstract
Proximal Policy Optimization (PPO) has become the de facto standard for training legged robots, thanks to its robustness and scalability in massively parallel simulation environments like IsaacLab. However, its on-policy nature makes it inherently sample-inefficient, preventing its use for continuous adaptation and fine-tuning on real hardware. Soft Actor-Critic (SAC), by contrast, is an off-policy algorithm that can reuse past experience, making it a natural candidate for sim-to-real transfer workflows where the same algorithm can be used both in simulation and for online learning on the real robot. Despite these advantages, SAC has consistently failed to match PPO's empirical performance in massively parallel training settings. This work identifies the root causes of this gap and introduces targeted modifications, covering policy initialization, timeout-aware critic targets, and multi-step return estimation, that enable SAC to train stably at scale. Evaluated across multiple legged robot platforms and diverse locomotion tasks, our approach closes the performance gap with PPO entirely.
Keywords
Related papers
Learning to Balance Motor Thermal Safety and Quadrupedal Locomotion Performance with Residual Policy
Yuhang Wan, Weixian Lin, Letian Qian +5 more
2026
A Bioinspired Underwater Robot with a Latch-Mediated Soft Bistable Mechanism
Chongze Bi, Wenjie Wu, Zonghao Zuo +1 more
2026
Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient
Haoxiang You, Yilang Liu, Davis Zong +5 more
2026
FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation
Zihui Zhang, Zhixuan Sun, Yafei Yang +3 more
2026