RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Charles Xu, Qiyang Li, Jianlan Luo, Sergey Levine

发表年份: 2025
引用次数: 5
访问权限: 开放获取

摘要

Generalization to Unseen ScenariosComposition for Long Horizon Tasks Fig. 1: RLDG improves generalist robot policies like OpenVLA and Octo by training specialist RL policies and using them to generate high-quality fine-tuning datasets.It has the flexibility to distill knowledge from multiple RL policies trained on individual narrowly scoped tasks into a single generalist.It can also be applied to the most critical sub-task of a long-horizon manipulation task, improving the success rate at the "bottleneck" while leveraging human demonstrations on parts of the task where it suffices.The resulting fine-tuned generalist policies are capable of precise manipulation, generalization to unseen scenarios, and composition of skills to solve long-horizon tasks.

关键词

Reinforcement learningDistillationAction (physics)Control (management)Process (computing)Supervisory control

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

Real-Time Obstacle Avoidance for Manipulators and Mobile Robots