StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems
Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as existing approaches vary substantially in architectures, training data, embodiment configurations, and benchmark-specific engineering. In this work, we introduce StarVLA-$α$, a simple yet strong baseline designed to study VLA design choices under controlled conditions. StarVLA-$α$ deliberately minimizes architectural and pipeline complexity to reduce experimental confounders and enable systematic analysis. Specifically, we re-evaluate several key design axes, including action modeling strategies, robot-specific pretraining, and interface engineering. Across unified multi-benchmark training on LIBERO, SimplerEnv, RoboTwin, and RoboCasa, the same simple baseline remains highly competitive, indicating that a strong VLM backbone combined with minimal design is already sufficient to achieve strong performance without relying on additional architectural complexity or engineering tricks. Notably, our single generalist model outperforms $π_{0.5}$ by 20\% on the public real-world RoboChallenge benchmark. We expect StarVLA-$α$ to serve as a solid starting point for future research in the VLA regime. Code will be released at https://github.com/starVLA/starVLA.
关键词
相关论文
一种面向线弧增材制造的电动汽车结构可制造性拓扑优化的双环框架
Qiang Cui, Chuan Yu, Daoqian Yang 等 5 位作者
Robotics and Computer-Integrated Manufacturing · 2026
几何数字孪生:一种用于航空发动机装配精度预测的数字智能模型
Ke Shang, Xin Jin, Teli Xu 等 7 位作者
Robotics and Computer-Integrated Manufacturing · 2026
通过人工智能驱动的机器人技术革新产业
Aryan Chaudhary
Recent Advances in Computer Science and Communications · 2026
新型大口径偏置馈电可展开天线设计与动态性能预测
Chuang Shi, Tianming Liu, Ning Xue 等 9 位作者
Aerospace Science and Technology · 2026