Enhancing object pose estimation for RGB images in cluttered scenes
Metwalli Al-Selwi, Ning Huang, Gao Yin, Yan Chao, Qiming Li, Jun Li
- 发表年份
- 2025
- 引用次数
- 17
- 访问权限
- 开放获取
摘要
Estimating the 6D pose of objects is crucial for robots to interact with the environment. 6D Object pose estimation from RGB images in a cluttered scene and heavy occlusions is a critical issue. Most existing methods use two stages to estimate object pose: First, extract the object features, and then use the PnP/RANSAC method to estimate object pose. However, most of these techniques merely localize a group of key-points by regressing their coordinates, which are vulnerable to occlusion and have poor performance for multi-object pose estimation. These methods cannot directly regress the 6D pose estimation from a loss during training. In this paper, we propose a framework based on convolutional neural network (CNN) and self-attention mechanism as an end-to-end method for single and multi-object 6D pose estimation using RGB images with low computational cost. Our method utilizes feature fusion to extract local features and combines multi-head self-attention (MHSA) with iterative refinement to improve pose estimation performance. Furthermore, our method can be scaled according to computational resources. Our experiments illustrate that our method performs in benchmark datasets the Linemod and Occlusion Linemod and achieves 97.45% and 84.84% in terms of the ADD(-S) metric in both datasets, respectively.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002