首页 /研究 /Improving Generalization of Language-Conditioned Robot Manipulation
MANIPULATION

Improving Generalization of Language-Conditioned Robot Manipulation

Chenglin Cui, Chaoran Zhu, Changjae Oh, Andrea Cavallaro

发表年份
2025
访问权限
开放获取

摘要

The control of robots for manipulation tasks generally relies on visual input. Recent advances in vision-language models (VLMs) enable the use of natural language instructions to condition visual input and control robots in a wider range of environments. However, existing methods require a large amount of data to fine-tune VLMs for operating in unseen environments. In this paper, we present a framework that learns object-arrangement tasks from just a few demonstrations. We propose a two-stage framework that divides object-arrangement tasks into a target localization stage, for picking the object, and a region determination stage for placing the object. We present an instance-level semantic fusion module that aligns the instance-level image crops with the text embedding, enabling the model to identify the target objects defined by the natural language instructions. We validate our method on both simulation and real-world robotic environments. Our method, fine-tuned with a few demonstrations, improves generalization capability and demonstrates zero-shot ability in real-robot manipulation scenarios.

关键词

cs.ROcs.CV

相关论文

查看 MANIPULATION 分类全部论文