首页 /研究 /6-DoF Grasp Detection Method Based on Vision Language Guidance
MANIPULATION

6-DoF Grasp Detection Method Based on Vision Language Guidance

Xixing Li, Rui Wu, Tao Liu

发表年份
2025
引用次数
1
访问权限
开放获取

摘要

The interactive grasp of robots can grasp the corresponding objects according to the user’s choice. Most interactive grasp methods based on deep learning comprise visual language and grasp detection models. However, in existing methods, the trainability and generalization ability of the visual language model is weak, and the robot cannot cope well with grasping small target objects. Therefore, this paper proposes a 6-DoF grasp detection method guided by visual language, which converts text instructions and RGBD images of the scene to be grasped into inputs and outputs for the 6-DoF grasp posture of the object corresponding to the text instructions. In order to improve the trainability and feature extraction ability of the visual language model, a multi-head attention mechanism combined with hybrid normalization is designed. At the same time, a local attention mechanism is introduced into the grasp detection model to enhance the global and local information interaction ability of point cloud data, thereby improving the grasping ability of the grasp detection model for small target objects. The method proposed in this paper first uses the improved visual language model to predict the plane position information of the target object, then uses the improved grasp detection model to predict all the graspable postures in the scene, and finally uses the plane position information to filter out the graspable postures of the target object. The visual language model and grasp detection model proposed in this paper have achieved excellent performance in various scenarios of public datasets while ensuring a specific generalization ability. In addition, we also conducted real grasp experiments, and the 6-DoF grasp detection method based on visual language guidance proposed in this paper achieved a grasp success rate of 95%.

关键词

GRASPComputer visionComputer scienceArtificial intelligenceHuman–computer interactionProgramming language

相关论文

查看 MANIPULATION 分类全部论文