Home /Research /RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

MANIPULATION

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Charles Xu, Qiyang Li, Jianlan Luo, Sergey Levine

Year: 2025
Citations: 5
Access: Open access

Abstract

Generalization to Unseen ScenariosComposition for Long Horizon Tasks Fig. 1: RLDG improves generalist robot policies like OpenVLA and Octo by training specialist RL policies and using them to generate high-quality fine-tuning datasets.It has the flexibility to distill knowledge from multiple RL policies trained on individual narrowly scoped tasks into a single generalist.It can also be applied to the most critical sub-task of a long-horizon manipulation task, improving the success rate at the "bottleneck" while leveraging human demonstrations on parts of the task where it suffices.The resulting fine-tuned generalist policies are capable of precise manipulation, generalization to unseen scenarios, and composition of skills to solve long-horizon tasks.

Keywords

Reinforcement learningDistillationAction (physics)Control (management)Process (computing)Supervisory control

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

Real-Time Obstacle Avoidance for Manipulators and Mobile Robots