首页 /研究 /A framework for robotic manipulation tasks based on multiple zero shot models
MANIPULATION

A framework for robotic manipulation tasks based on multiple zero shot models

Yifan Li, Peiyang Jiang, Chengpeng Chai, Xuyang Zhang, Chengguo Liu

发表年份
2025
引用次数
1
访问权限
开放获取

摘要

Humans tackle unknown tasks by integrating information from multiple sensory modalities. Existing robotic frameworks struggle to achieve effective multimodal manipulation, especially when sufficient training data is lacking. This study introduces "Panda Act", a novel robotic manipulation mechanism that leverages large language models (LLMs) and multimodal zero-shot models. The manipulation strategies are generated by LLMs as Python code, which dynamically orchestrates a suite of zero-shot visual and auditory models to fulfil task requirements. This enables robots to execute multimodal manipulations without requiring additional training. Extensive experiments in both simulated and real-world environments demonstrate that this approach excels in task comprehension, zero-shot execution, and adaptability, opening new avenues for enhancing robot adaptability in uncertain environments.

关键词

Computer scienceShot (pellet)Zero (linguistics)Artificial intelligenceHuman–computer interactionChemistry

相关论文

查看 MANIPULATION 分类全部论文