SF-Pose: Semantic-Fusion Six Degrees of Freedom Object Pose Estimation via Pyramid Transformer for Industrial Scenarios
Jikun Wang, Yinlong Liu, Zhi-Xin Yang
- Year
- 2025
- Citations
- 2
Abstract
Object six degrees of freedom (6-DoF) pose estimation is the powerful vision algorithm for the robot-environment interaction. However, current robust pose estimation algorithms rely heavily on labeled real data with high-cost collection, making it difficult to apply the algorithm. Many studies discuss the use of synthetic data as a complement to real datasets. However, reducing the gap between synthetic and real data is still a challenging problem. Based on the consistency of object geometric characteristics between real data and synthetic data, we argue that multi-input, rather than image-only input, is more suitable for transfer from synthetic to real, because it strengthens the extraction of object geometric feature. Therefore, we propose a semantic-fusion 6-DoF object pose estimation method that effectively capture common features across various resolutions by employing the designed pyramid transformer feature-fusion module. Extensive experiments show that the proposed method performs better than the state-of-the-art (SOTA), indicating that the proposed method can effectively extract and fuse different representations. Furthermore, in response to the lack of industrial scene datasets, we also develop a synthetic pose dataset and conduct the human-robot collaboration experiment to verify the robustness of the proposed method. Note to Practitioners—The purpose of this paper is to bridge the gap between synthetic and real data for pose estimation of industrial tools. Our method can be trained only on synthetic data and accurately estimate pose parameters in real scenes. Combining physically-based renderer and industrial tools, such as hammers and screwdrivers, a synthetic dataset of industrial scenes can be produced using the data production pipeline proposed in this paper. In this case, the trained model can assist the robot vision system to understand object pose information in a real production workshop. Extensive dataset experiments and human-robot collaboration experiments demonstrate the effectiveness of the proposed method. In addition, based on the actual robot working environment, practitioners can produce industrial datasets from multiple angles, objects, and scenes. Sufficient datasets can enhance the model’s generalization and robustness.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002