首页 /研究 /Skeleton-Aware Representation of Spatio-Temporal Kinematics for 3D Human Motion Prediction
HRI

Skeleton-Aware Representation of Spatio-Temporal Kinematics for 3D Human Motion Prediction

Сонглин Ду, Zhihan Zhuang, Zenghui Wang, Takeshi Ikenaga

发表年份
2025
引用次数
3

摘要

3D human motion prediction, which attempts to foresee the behaviors of human, is an issue of great significance in computer vision. Attention-based neural networks and graph convolution networks (GCNs) have recently shown great promise in 3D skeleton-based human motion prediction for their attractive performance in learning spatial and temporal kinematics. However, existing methods have several critical issues: 1) Spatial dependencies for distal joints in each independent frame are hard to learn; 2) The GCN ignores hierarchical structure and diverse motion patterns of different body parts; 3) Existing methods disregard the statistical interdependence inherent in time series data. To address these issues, this paper proposes a skeleton-aware representation of spatio-temporal kinematics for 3D human motion prediction. The proposed method makes three key contributions: a learnable temporal aggregation, a skeleton-aware spatio-temporal attention, and an upper/lower decoupling GCN. The learnable temporal aggregation selectively obtains past information by leveraging the dependencies between each time step and its historical moments. The skeleton-aware spatio-temporal attention method leverages the self-attention mechanism and a designed adjacency matrix to model the skeleton constraints of distal joints. The upper/lower decoupling GCN introduces a grouping strategy to learn the dynamics of various body parts separately. Experimental results on three publicly available datasets demonstrate that the proposed method achieves state-of-the-art performances for both short-term prediction and long-term prediction. Note to Practitioners—3D human motion prediction forms a fundamental component of human-centered automation systems by enabling safer, more efficient, and more natural interactions between humans and machines. This paper was motivated by the challenges of predicting human motion: 1) Explicitly capturing the complex spatial patterns of distal joints is challenging; 2) Neglecting the inter-part variations of motion dynamics is problematic; 3) Neglecting the statistical interdependence inherent in time series data of human motion leads to poor performance. This paper suggests a skeleton-aware representation of spatio-temporal kinematics for 3D human motion prediction through three innovations: a learnable temporal aggregation, a skeleton-aware spatio-temporal attention, and an upper/lower decoupling GCN. The three contributions overcome the weaknesses of existing works and made a pioneering attempt of skeleton-aware representation of spatio-temporal human kinematics. It will significatively advance the development of many automation systems relevant to human motion prediction such as human-robot interaction and teleoperation.

关键词

KinematicsSkeleton (computer programming)Representation (politics)Computer visionMotion (physics)Artificial intelligenceComputer scienceMotion capturePhysicsClassical mechanics

相关论文

查看 HRI 分类全部论文