首页 /研究 /CollaBot: Vision-Language Guided Simultaneous Collaborative Manipulation
SWARM

CollaBot: Vision-Language Guided Simultaneous Collaborative Manipulation

Kun Song, Gaoming Chen, Shentao Ma, Ninglong Jin, Guangbao Zhao, Mingyu Ding, Zhenhua Xiong, Jia Pan

发表年份
2025
访问权限
开放获取

摘要

One central goal of robotics is to enable robots to interact with the physical world. Traditional manipulation studies primarily focus on single robots and relatively small objects. However, factory and domestic environments often require large-object manipulation, such as moving tables, where multiple robots must work collaboratively. Existing studies still lack a generalizable framework that can handle diverse objects, tasks, and robot team sizes. In this work, we propose CollaBot, a generalist framework for simultaneous collaborative manipulation. First, we use SEEM for scene segmentation and target-object extraction. Then, we propose a collaborative grasping framework that decomposes the task into local grasp pose generation and global coordination. Finally, we design a two-stage planning module to generate collision-free trajectories for task execution. Experimental results across different settings with varying objects, tasks, and numbers of robots indicate that our framework achieves a 72% success rate. This marks a substantial improvement over behavior cloning-based methods, validating the advantages of the proposed framework in complex multi-robot cooperative tasks. Real-world experiments further demonstrate the feasibility of our method in practical applications.

关键词

cs.RO

相关论文

查看 SWARM 分类全部论文