RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
Charles Xu, Qiyang Li, Jianlan Luo, Sergey Levine
- Year
- 2025
- Citations
- 5
- Access
- Open access
Abstract
Generalization to Unseen ScenariosComposition for Long Horizon Tasks Fig. 1: RLDG improves generalist robot policies like OpenVLA and Octo by training specialist RL policies and using them to generate high-quality fine-tuning datasets.It has the flexibility to distill knowledge from multiple RL policies trained on individual narrowly scoped tasks into a single generalist.It can also be applied to the most critical sub-task of a long-horizon manipulation task, improving the success rate at the "bottleneck" while leveraging human demonstrations on parts of the task where it suffices.The resulting fine-tuned generalist policies are capable of precise manipulation, generalization to unseen scenarios, and composition of skills to solve long-horizon tasks.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
Real-Time Obstacle Avoidance for Manipulators and Mobile Robots
Oussama Khatib
1986