首页 /研究 /A Symmetry-Informed Multimodal LLM-Driven Approach to Robotic Object Manipulation: Lowering Entry Barriers in Mechatronics Education
MANIPULATION

A Symmetry-Informed Multimodal LLM-Driven Approach to Robotic Object Manipulation: Lowering Entry Barriers in Mechatronics Education

Jorge Gudiño-Lau, Miguel Durán-Fonseca, Luis Anido, Pedro C. Santana‐Mancilla

发表年份
2025
引用次数
1
访问权限
开放获取

摘要

The integration of Large Language Models (LLMs), particularly Visual Language Models (VLMs), into robotics promises more intuitive human–robot interactions; however, challenges remain in efficiently translating high-level commands into precise physical actions. This paper presents a novel architecture for vision-based object manipulation that leverages a VLM’s reasoning capabilities while incorporating symmetry principles to enhance operational efficiency. Implemented on a Yahboom DOFBOT educational robot with a Jetson Nano platform, our system introduces a prompt-based framework that uniquely embeds symmetry-related cues to streamline feature extraction and object detection from visual data. This methodology, which utilizes few-shot learning, enables the VLM to generate more accurate and contextually relevant commands for manipulation tasks by efficiently interpreting the symmetric and asymmetric features of objects. The experimental results in controlled scenarios demonstrate that our symmetry-informed approach significantly improves the robot’s interaction efficiency and decision-making accuracy compared to generic prompting strategies. This work contributes a robust method for integrating fundamental vision principles into modern generative AI workflows for robotics. Furthermore, its implementation on an accessible educational platform shows its potential to simplify complex robotics concepts for engineering education and research.

关键词

RoboticsObject (grammar)WorkflowRobotMechatronicsGenerative grammar

相关论文

查看 MANIPULATION 分类全部论文