EXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models
Perry Dong, Kuo-Han Hung, Tian Gao, Dorsa Sadigh, Chelsea Finn
2026
Abstract
The ability to efficiently and reliably learn new tasks has been a foundational challenge in robotics. Vision-Language-Action (VLA) models have demonstrated strong generalization across diverse manipulation tasks, yet pretrained policies consistently fall short of the reliability required for real-world deployment. Reinforcement learning (RL) fine-tuning offers a promising path to bridge this gap, but existing approaches either train from scratch without fully leveraging pretrained priors, or fine-tune VLAs without achieving the sample efficiency and success rates that practical deployment demands. We present EXPO-FT, a system for stable, sample-efficient RL finetuning of pretrained VLA policies that closes this gap. Our system solves a suite of challenging manipulation tasks, including routing string lights and inserting the plug to light it up, striking a pool ball into a pocket, and inserting a flower into a wine bottle, each requiring combinations of high precision, dynamic actions, and robustness to varied initial states. Our system achieves perfect task performance (30/30 successes) across all evaluated tasks within an average of 19.1 minutes of online robot data, outperforming both prior RL-from-scratch and VLA finetuning approaches. We release an open-source codebase with the aim of facilitating broader adoption of RL finetuning of VLA models in robotics.
Keywords
Related papers
TCBiRRT: Rapid Motion Planning for Tightly Coupled Dual-arm Space Manipulator Using Task-space Random Expansion
Jiawei Zhang, Xinhao Miao, Jifeng Guo +2 more
2026
Learning to Balance Motor Thermal Safety and Quadrupedal Locomotion Performance with Residual Policy
Yuhang Wan, Weixian Lin, Letian Qian +5 more
2026
Object Pose and Shape Estimation for Grasping: Does it Work?
Pavan Karke, Kushal Shah, Gaurav Singh +3 more
2026
Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient
Haoxiang You, Yilang Liu, Davis Zong +5 more
2026