GenCo: A Dual VLM Generate-Correct Framework for Adaptive Peg-in-Hole Robotics
Zhengxue Zhou, Satheeshkumar Veeramani, Hatem Fakhruldeen, Andrew I. Cooper
- 发表年份
- 2025
- 引用次数
- 1
摘要
Recent advances in Vision Language Models (VLMs) have enhanced their application in robotics, encompassing both high-level task planning and low-level action control. Despite their strong performance across various robotic tasks, even for zero-shot scenarios, most VLM applications remain open-loop, adhering to a plan-and-execute paradigm without mechanisms to assess task completion. To address this limitation, we propose GenCo, a Generate-Correct framework designed to automate a peg-in-hole task using a UR5e robot. This framework integrates an VLM-based motion generator and motion expert, working collaboratively to refine and correct actions during robotic task execution. Both VLM agents are fine-tuned using the pre-trained LLaVA, enhancing adaptability and scaling efficiently to diverse tasks. Our experiments demonstrate the adaptiveness of the framework, improving the success rate for the peg-in-hole task by 12.75% compared to a single VLM open-loop method. Notably, in unseen scenarios, the success rate for a triangular peg was increased by 15%, and for a random-shaped peg by 17%, underscoring the system's effectiveness in handling novel tasks. Adaptive testing under varied camera positions demonstrated robust performance, affirming reliability despite shifts in the visual input. The framework is also designed to be lightweight and efficient, facilitating broader adoption and practical deployment. Access to our code and model is provided here: https://github.com/Zhengxuez/generate_correct
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991