Home /Research /A Data Capture and Gesture Recognition System to Enable Human-Robot Collaboration
HRI

A Data Capture and Gesture Recognition System to Enable Human-Robot Collaboration

Sonam Naidu, Evan M. Smith, Camp Hagood, Aramis Rolly, Sujan Sarker, Cory J. Hayes

Year
2025
Citations
1

Abstract

Effective human-robot collaboration (HRC) relies on intuitive and reliable communication modalities, particularly in dynamic environments where traditional verbal or wearable sensor-based systems may be unreliable. While gesture-based communication offers a natural and non-intrusive alternative, it remains challenging due to limitations in current recognition systems, such as their dependence on large labeled datasets and lack of adaptability in various environmental conditions. Recent advances in vision-language models (VLMs) have shown promise in video understanding and general reasoning. However, they often lack the domain-specific context required for accurate classification in specialized applications. To address these challenges, we introduce a novel gesture recognition system that leverages a vision-language model (VLM) guided by retrieval-augmented generation (RAG) and chain-of-thought (CoT) prompting to introduce contextual understanding and reasoning. Our system captures upper-body gestures using an Azure Kinect, extracts sampled frames, and classifies them using GPT-4o enhanced by RAG from military gesture documentation and CoT reasoning strategies. Recognized gestures are encoded as ROS 2 messages and transmitted using a publisher-subscriber model to command a mobile robot to execute the corresponding actions. We validate our approach through controlled experiments using seven U.S. Marine Corps (USMC) gestures. The system achieved an accuracy of $80 \%$, an F1 score of $89.9 \%$, and demonstrated effective gesture-torobot execution. Our results highlight the potential of VLMs for zero-shot gesture classification and robotic control, providing a foundation for robust, scalable, and field-deployable gesturebased HRC systems.

Keywords

Computer scienceGestureGesture recognitionHuman–robot interactionRobotHuman–computer interactionArtificial intelligenceMotion captureComputer visionMotion (physics)

Related papers

Browse all HRI papers