A framework for robotic manipulation tasks based on multiple zero shot models
Yifan Li, Peiyang Jiang, Chengpeng Chai, Xuyang Zhang, Chengguo Liu
- Year
- 2025
- Citations
- 1
- Access
- Open access
Abstract
Humans tackle unknown tasks by integrating information from multiple sensory modalities. Existing robotic frameworks struggle to achieve effective multimodal manipulation, especially when sufficient training data is lacking. This study introduces "Panda Act", a novel robotic manipulation mechanism that leverages large language models (LLMs) and multimodal zero-shot models. The manipulation strategies are generated by LLMs as Python code, which dynamically orchestrates a suite of zero-shot visual and auditory models to fulfil task requirements. This enables robots to execute multimodal manipulations without requiring additional training. Extensive experiments in both simulated and real-world environments demonstrate that this approach excels in task comprehension, zero-shot execution, and adaptability, opening new avenues for enhancing robot adaptability in uncertain environments.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002