UniHM: Unified Dexterous Hand Manipulation with Vision Language Model
Zhenhao Zhang, Jiaxin Liu, Ye Shi, Jingya Wang
- Year
- 2026
- Access
- Open access
Abstract
Planning physically feasible dexterous hand manipulation is a central challenge in robotic manipulation and Embodied AI. Prior work typically relies on object-centric cues or precise hand-object interaction sequences, foregoing the rich, compositional guidance of open-vocabulary instruction. We introduce UniHM, the first framework for unified dexterous hand manipulation guided by free-form language commands. We propose a Unified Hand-Dexterous Tokenizer that maps heterogeneous dexterous-hand morphologies into a single shared codebook, improving cross-dexterous hand generalization and scalability to new morphologies. Our vision language action model is trained solely on human-object interaction data, eliminating the need for massive real-world teleoperation datasets, and demonstrates strong generalizability in producing human-like manipulation sequences from open-ended language instructions. To ensure physical realism, we introduce a physics-guided dynamic refinement module that performs segment-wise joint optimization under generative and temporal priors, yielding smooth and physically feasible manipulation sequences. Across multiple datasets and real-world evaluations, UniHM attains state-of-the-art results on both seen and unseen objects and trajectories, demonstrating strong generalization and high physical feasibility. Our project page at \href{https://unihm.github.io/}{https://unihm.github.io/}.
Keywords
Related papers
Review and perspectives on multimodal perception, mutual cognition, and embodied execution for human–robot collaboration in Industry 5.0
Kai Ding, Qingyuan Mao, Yaqian Zhang +3 more
Robotics and Computer-Integrated Manufacturing · 2026
Towards human-centric manufacturing: Task planning under uncertainties in human–robot collaborative assembly
Yingchao You, Ze Ji, Changyun Wei
Robotics and Computer-Integrated Manufacturing · 2026
Agentic HRC: Achieving context alignment via memory for Human–Robot Collaboration
Jiahui Si, Wenchao Li, Xi Chen +4 more
Robotics and Computer-Integrated Manufacturing · 2026
Adaptive Physics-informed Transformer with Gaussian process residual compensation for inverse dynamics modeling in Human–Robot Collaboration
Rui Qian, Xi Zhang, Dongpeng Li +2 more
Robotics and Computer-Integrated Manufacturing · 2026