首页 /研究 /GenCo: A Dual VLM Generate-Correct Framework for Adaptive Peg-in-Hole Robotics
OTHER

GenCo: A Dual VLM Generate-Correct Framework for Adaptive Peg-in-Hole Robotics

Zhengxue Zhou, Satheeshkumar Veeramani, Hatem Fakhruldeen, Andrew I. Cooper

发表年份
2025
引用次数
1

摘要

Recent advances in Vision Language Models (VLMs) have enhanced their application in robotics, encompassing both high-level task planning and low-level action control. Despite their strong performance across various robotic tasks, even for zero-shot scenarios, most VLM applications remain open-loop, adhering to a plan-and-execute paradigm without mechanisms to assess task completion. To address this limitation, we propose GenCo, a Generate-Correct framework designed to automate a peg-in-hole task using a UR5e robot. This framework integrates an VLM-based motion generator and motion expert, working collaboratively to refine and correct actions during robotic task execution. Both VLM agents are fine-tuned using the pre-trained LLaVA, enhancing adaptability and scaling efficiently to diverse tasks. Our experiments demonstrate the adaptiveness of the framework, improving the success rate for the peg-in-hole task by 12.75% compared to a single VLM open-loop method. Notably, in unseen scenarios, the success rate for a triangular peg was increased by 15%, and for a random-shaped peg by 17%, underscoring the system's effectiveness in handling novel tasks. Adaptive testing under varied camera positions demonstrated robust performance, affirming reliability despite shifts in the visual input. The framework is also designed to be lightweight and efficient, facilitating broader adoption and practical deployment. Access to our code and model is provided here: https://github.com/Zhengxuez/generate_correct

关键词

Dual (grammatical number)Artificial intelligenceComputer scienceRoboticsMachine learningRobotArt

相关论文

查看 OTHER 分类全部论文