Home /Research /SF-Pose: Semantic-Fusion Six Degrees of Freedom Object Pose Estimation via Pyramid Transformer for Industrial Scenarios

HRI

SF-Pose: Semantic-Fusion Six Degrees of Freedom Object Pose Estimation via Pyramid Transformer for Industrial Scenarios

Jikun Wang, Yinlong Liu, Zhi-Xin Yang

Year: 2025
Citations: 2

Abstract

Object six degrees of freedom (6-DoF) pose estimation is the powerful vision algorithm for the robot-environment interaction. However, current robust pose estimation algorithms rely heavily on labeled real data with high-cost collection, making it difficult to apply the algorithm. Many studies discuss the use of synthetic data as a complement to real datasets. However, reducing the gap between synthetic and real data is still a challenging problem. Based on the consistency of object geometric characteristics between real data and synthetic data, we argue that multi-input, rather than image-only input, is more suitable for transfer from synthetic to real, because it strengthens the extraction of object geometric feature. Therefore, we propose a semantic-fusion 6-DoF object pose estimation method that effectively capture common features across various resolutions by employing the designed pyramid transformer feature-fusion module. Extensive experiments show that the proposed method performs better than the state-of-the-art (SOTA), indicating that the proposed method can effectively extract and fuse different representations. Furthermore, in response to the lack of industrial scene datasets, we also develop a synthetic pose dataset and conduct the human-robot collaboration experiment to verify the robustness of the proposed method. Note to Practitioners—The purpose of this paper is to bridge the gap between synthetic and real data for pose estimation of industrial tools. Our method can be trained only on synthetic data and accurately estimate pose parameters in real scenes. Combining physically-based renderer and industrial tools, such as hammers and screwdrivers, a synthetic dataset of industrial scenes can be produced using the data production pipeline proposed in this paper. In this case, the trained model can assist the robot vision system to understand object pose information in a real production workshop. Extensive dataset experiments and human-robot collaboration experiments demonstrate the effectiveness of the proposed method. In addition, based on the actual robot working environment, practitioners can produce industrial datasets from multiple angles, objects, and scenes. Sufficient datasets can enhance the model’s generalization and robustness.

Keywords

PoseTransformerArtificial intelligenceComputer visionComputer scienceFusionSensor fusion3D pose estimationEngineeringVoltage

SF-Pose: Semantic-Fusion Six Degrees of Freedom Object Pose Estimation via Pyramid Transformer for Industrial Scenarios

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory