Home /Research /Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

LEARNING

Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

Haoxiang You, Yilang Liu, Davis Zong, Qian Wang, Teeratham Vitchutripop, Qi Wang, Daniel Rakita, Ian Abraham

Year: 2026
Citations: 0
Access: Open access

Abstract

We present the stochastic decoupled policy gradient (SDPG), a lightweight visual reinforcement learning (RL) method that trains diverse visuomotor control policies end-to-end within a few hours on a single NVIDIA RTX 4080 GPU. SDPG estimates policy gradients via random perturbations of trajectory rollouts, requiring orders of magnitude fewer batch-rendered environments and substantially reducing compute and memory overhead. On visual MuJoCo benchmarks, SDPG consistently outperforms baseline methods in training time, memory usage, and rewards. Finally, to support future research, we introduce a suite of realistic visual robotics benchmarks spanning dexterous manipulation, challenging locomotion, and demonstrate effective sim-to-real transfer on physical hardware.

Keywords

visual reinforcement learningpolicy gradientsample efficiencysim-to-real

Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

Abstract

Keywords

Related papers

The Organization of Behavior

Fractional Brownian Motions, Fractional Noises and Applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A guide to deep learning in healthcare