Home /Research /Enhancing object pose estimation for RGB images in cluttered scenes
LEARNING

Enhancing object pose estimation for RGB images in cluttered scenes

Metwalli Al-Selwi, Ning Huang, Gao Yin, Yan Chao, Qiming Li, Jun Li

Year
2025
Citations
17
Access
Open access

Abstract

Estimating the 6D pose of objects is crucial for robots to interact with the environment. 6D Object pose estimation from RGB images in a cluttered scene and heavy occlusions is a critical issue. Most existing methods use two stages to estimate object pose: First, extract the object features, and then use the PnP/RANSAC method to estimate object pose. However, most of these techniques merely localize a group of key-points by regressing their coordinates, which are vulnerable to occlusion and have poor performance for multi-object pose estimation. These methods cannot directly regress the 6D pose estimation from a loss during training. In this paper, we propose a framework based on convolutional neural network (CNN) and self-attention mechanism as an end-to-end method for single and multi-object 6D pose estimation using RGB images with low computational cost. Our method utilizes feature fusion to extract local features and combines multi-head self-attention (MHSA) with iterative refinement to improve pose estimation performance. Furthermore, our method can be scaled according to computational resources. Our experiments illustrate that our method performs in benchmark datasets the Linemod and Occlusion Linemod and achieves 97.45% and 84.84% in terms of the ADD(-S) metric in both datasets, respectively.

Keywords

Artificial intelligenceComputer visionPoseComputer scienceRGB color modelObject (grammar)EstimationPattern recognition (psychology)

Related papers

Browse all LEARNING papers