A Data Capture and Gesture Recognition System to Enable Human-Robot Collaboration
Sonam Naidu, Evan M. Smith, Camp Hagood, Aramis Rolly, Sujan Sarker, Cory J. Hayes
- Year
- 2025
- Citations
- 1
Abstract
Effective human-robot collaboration (HRC) relies on intuitive and reliable communication modalities, particularly in dynamic environments where traditional verbal or wearable sensor-based systems may be unreliable. While gesture-based communication offers a natural and non-intrusive alternative, it remains challenging due to limitations in current recognition systems, such as their dependence on large labeled datasets and lack of adaptability in various environmental conditions. Recent advances in vision-language models (VLMs) have shown promise in video understanding and general reasoning. However, they often lack the domain-specific context required for accurate classification in specialized applications. To address these challenges, we introduce a novel gesture recognition system that leverages a vision-language model (VLM) guided by retrieval-augmented generation (RAG) and chain-of-thought (CoT) prompting to introduce contextual understanding and reasoning. Our system captures upper-body gestures using an Azure Kinect, extracts sampled frames, and classifies them using GPT-4o enhanced by RAG from military gesture documentation and CoT reasoning strategies. Recognized gestures are encoded as ROS 2 messages and transmitted using a publisher-subscriber model to command a mobile robot to execute the corresponding actions. We validate our approach through controlled experiments using seven U.S. Marine Corps (USMC) gestures. The system achieved an accuracy of $80 \%$, an F1 score of $89.9 \%$, and demonstrated effective gesture-torobot execution. Our results highlight the potential of VLMs for zero-shot gesture classification and robotic control, providing a foundation for robust, scalable, and field-deployable gesturebased HRC systems.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002