A framework for robotic manipulation tasks based on multiple zero shot models
Yifan Li, Peiyang Jiang, Chengpeng Chai, Xuyang Zhang, Chengguo Liu
- 发表年份
- 2025
- 引用次数
- 1
- 访问权限
- 开放获取
摘要
Humans tackle unknown tasks by integrating information from multiple sensory modalities. Existing robotic frameworks struggle to achieve effective multimodal manipulation, especially when sufficient training data is lacking. This study introduces "Panda Act", a novel robotic manipulation mechanism that leverages large language models (LLMs) and multimodal zero-shot models. The manipulation strategies are generated by LLMs as Python code, which dynamically orchestrates a suite of zero-shot visual and auditory models to fulfil task requirements. This enables robots to execute multimodal manipulations without requiring additional training. Extensive experiments in both simulated and real-world environments demonstrate that this approach excels in task comprehension, zero-shot execution, and adaptability, opening new avenues for enhancing robot adaptability in uncertain environments.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002