A tutorial note on collecting simulated data for vision-language-action models
Heran Wu, Zirun Zhou, Jingfeng Zhang
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Traditional robotic systems typically decompose intelligence into independent modules for computer vision, natural language processing, and motion control. Vision-Language-Action (VLA) models fundamentally transform this approach by employing a single neural network that can simultaneously process visual observations, understand human instructions, and directly output robot actions -- all within a unified framework. However, these systems are highly dependent on high-quality training datasets that can capture the complex relationships between visual observations, language instructions, and robotic actions. This tutorial reviews three representative systems: the PyBullet simulation framework for flexible customized data generation, the LIBERO benchmark suite for standardized task definition and evaluation, and the RT-X dataset collection for large-scale multi-robot data acquisition. We demonstrated dataset generation approaches in PyBullet simulation and customized data collection within LIBERO, and provide an overview of the characteristics and roles of the RT-X dataset for large-scale multi-robot data acquisition.
关键词
相关论文
基于嵌入式语言模型的多机器人系统动态重构
Shokhikha Amalana Murdivien, Jongsu Park, Jumyung Um
Robotics and Computer-Integrated Manufacturing · 2026
基于大语言模型增强的多智能体强化学习的无人机博弈分层决策
Xinyu Dong, Bo Li, Guangyu Zhang 等 5 位作者
Aerospace Science and Technology · 2026
水下残骸区域多UUV协同覆盖搜索的编队优化与避碰决策方法
Haomiao Yu, Zeyuan Zhang, Yantian Ma
Robotics and Autonomous Systems · 2026
人在回路中的群体机器人:一种用于真实土壤测绘的仿生群体方法
Petras Swissler, Mohammadali Rashidioun, Nicholas Sahu 等 6 位作者
2026