首页 /研究 /Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

LOCOMOTION

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

Shuo Xin, Zhen Zhang, Mengmeng Wang, Xiaojun Hou, Yaowei Guo, Xiao Kang, Liang Liu, Yong Liu

发表年份: 2024
引用次数: 4

摘要

Tracking a specific person in 3D scene is gaining momentum due to its numerous applications in robotics. Currently, most 3D trackers focus on driving scenarios with neglected jitter and uncomplicated surroundings, which results in their severe degeneration in complex environments, especially on jolting robot platforms (only 20-60% success rate). To improve the accuracy, a Point-Video-based Transformer Tracking model (PVTrack) is presented for robots. It is the first multi-modal 3D human tracking work that incorporates point clouds together with RGB videos to achieve information complementarity. Moreover, PVTrack proposes the Siamese Point-Video Transformer for feature aggregation to overcome dynamic environments, which captures more target-aware information through the hierarchical attention mechanism adaptively. Considering the violent shaking on robots and rugged terrains, a lateral Human-ware Proposal Network is designed together with an Anti-shake Proposal Compensation module. It alleviates the disturbance caused by complex scenes as well as the particularity of the robot platform. Experiments show that our method achieves state-of-the-art performance on both KITTI/Waymo datasets and a quadruped robot for various indoor and outdoor scenes.

关键词

RobotComputer scienceTransformerModalComputer visionArtificial intelligenceTracking (education)Human–robot interactionHuman–computer interactionEngineering

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory