Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient
Haoxiang You, Yilang Liu, Davis Zong, Qian Wang, Teeratham Vitchutripop, Qi Wang, Daniel Rakita, Ian Abraham
2026
Abstract
We present the stochastic decoupled policy gradient (SDPG), a lightweight visual reinforcement learning (RL) method that trains diverse visuomotor control policies end-to-end within a few hours on a single NVIDIA RTX 4080 GPU. SDPG estimates policy gradients via random perturbations of trajectory rollouts, requiring orders of magnitude fewer batch-rendered environments and substantially reducing compute and memory overhead. On visual MuJoCo benchmarks, SDPG consistently outperforms baseline methods in training time, memory usage, and rewards. Finally, to support future research, we introduce a suite of realistic visual robotics benchmarks spanning dexterous manipulation, challenging locomotion, and demonstrate effective sim-to-real transfer on physical hardware.
Keywords
Related papers
Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers
Keyi Shen, Glen Chou
2026
Trust Region Q Adjoint Matching
Yonghoon Dong, Kyungmin Lee, Changyeon Kim +2 more
2026
Manipulating Tangible Virtual Object Dynamics to Promote Learning of Precision Force Generation
Alberto Garzás-Villar, Alba Riera-Cardona, Alexis Derumigny +3 more
2026
Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference
Jaewoo Lee, Hyeongyu Kang, Dohyun Kim +9 more
2026