首页 /研究 /Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation

PERCEPTION

Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation

Jaewoo Park, Jaeguk Kim, Nam Ik Cho

发表年份: 2024
访问权限: 开放获取

摘要

Accurately estimating the pose of an object is a crucial task in computer vision and robotics. There are two main deep learning approaches for this: geometric representation regression and iterative refinement. However, these methods have some limitations that reduce their effectiveness. In this paper, we analyze these limitations and propose new strategies to overcome them. To tackle the issue of blurry geometric representation, we use positional encoding with high-frequency components for the object's 3D coordinates. To address the local minimum problem in refinement methods, we introduce a normalized image plane-based multi-reference refinement strategy that's independent of intrinsic matrix constraints. Lastly, we utilize adaptive instance normalization and a simple occlusion augmentation method to help our model concentrate on the target object. Our experiments on Linemod, Linemod-Occlusion, and YCB-Video datasets demonstrate that our approach outperforms existing methods. We will soon release the code.

关键词

cs.CV

Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation

摘要

关键词

相关论文

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Vision meets robotics: The KITTI dataset