Home /Research /Metal Parts’ Zero-Shot 6D Pose Estimation via Foundation Model and Template Update for Industrial Scenario
OTHER

Metal Parts’ Zero-Shot 6D Pose Estimation via Foundation Model and Template Update for Industrial Scenario

Han Xu Sun, Yizhao Wang, Zhenning Zhou, Nailong Liu, Randolph Osivue Odekhe, Qixin Cao

Year
2025
Citations
3

Abstract

The 6D pose estimation for metal parts is essential in industrial robotic applications. Although the 6D pose estimation methods that rely on object-specific training have gained extensive concern, these methods can’t generalize to novel objects. Recent novel object pose estimation methods are solving this issue using task-specific fine-tuned CNNs for deep template matching. However, these methods require expensive training procedures and don’t consider pose distribution of rendered templates. Recently, foundation models show strong representation learning ability, and can encode both the high spatial information as well as semantic information. In this study, we present a metal parts’ zero-shot 6D pose estimation method by foundation model without re-training, and incorporate prior information of metal parts’ poses to generate rendered templates that align with the pose distribution in real world. DINOv2, a recent vision foundation model with impressive generalization capabilities, is employed for matching rendered templates against query images of metal parts. Additionally, we introduce the optimal transport as a similarity metric. Then, the Gluestick is utilized to establish local keypoint correspondences, which enable deriving geometric correspondences and are used for estimating the metal part’s 6D pose with PnP/RANSAC. Given an industrial scenario, we first estimate metal parts’ pose, select and save high-confidence pose results to pose buffer, which is used to update the templates. The experiments on MP6D and ROBI datasets showcase that the proposed method has better performance than MegaPose. We also conduct real world experiments, which demonstrate the robustness of the proposed method. Code is available at https://github.com/sunhan1997/IndusPose.

Keywords

Zero (linguistics)Foundation (evidence)Computer scienceOne shotShot (pellet)Artificial intelligenceComputer visionEngineering drawingEngineeringMechanical engineering

Related papers

Browse all OTHER papers