Vote-based multimodal fusion for hand-held object pose estimation
Dinh-Cuong Hoang, Phan Xuan Tan, Anh-Nhat Nguyen, Duc-Long Pham, Van-Duc Vu, Van-Thiep Nguyen, Thu-Uyen Nguyen, Duc-Thanh Tran, Khanh-Toan Phan, Xuan-Tung Dinh, Van-Hiep Duong, Ngoc-Trung Ho, Hai-Nam Pham, Viet-Anh Trinh, Son-Anh Bui
- 发表年份
- 2025
- 引用次数
- 1
摘要
Estimating the pose of hand-held objects is a critical and challenging task in computer vision and robotics, with applications in robotic manipulation, human–robot interaction, and augmented reality (AR). Leveraging multi-modal data, such as color (RGB) and depth (D) images, provides a promising avenue for addressing these challenges. However, existing approaches face two significant limitations. First, hand-induced occlusions often obscure critical object features, limiting the accuracy of conventional pose estimation methods. Second, most current techniques extract features from separate backbones and fuse them at the feature level, which can lead to representation distribution shifts and performance disruptions during fine-tuning due to dense interactions between RGB and depth branches. In this work, we propose a novel deep neural network for hand-held object pose estimation using RGB-D images as input. Our approach introduces a vote-based fusion mechanism that dynamically integrates multimodal data, effectively addressing occlusions and representation misalignments. Additionally, we incorporate hand-object keypoint interactions through a specialized module, enabling more accurate pose estimation in complex scenarios. Experiments on three public datasets demonstrate significant improvements in accuracy and robustness, with accuracy gains of up to 15% over state-of-the-art methods. Furthermore, on-site experimental verification highlights the practicality of our framework, achieving an average precision of 76.8% and outperforming existing methods by margins of up to 13.9%. The proposed method also achieves competitive inference times of 40 ms without refinement and 200 ms with refinement, demonstrating its suitability for real-world applications.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002