Benchmarking YOLOv8 to YOLOv13 for robust hand gesture recognition in human–robot interaction
Yifang Gao, Wei Luo, Shunshun Zhang, Nur Syazreen Ahmad, Xiaojun Wang, Patrick Goh
- Year
- 2025
- Citations
- 6
- Access
- Open access
Abstract
Real-time and accurate hand gesture detection is essential for safe and intuitive Human-Robot Interaction (HRI), enabling robots to interpret non-verbal cues and respond appropriately in dynamic environments. This research evaluates the effectiveness of YOLOv8n through YOLOv13n models in recognizing static hand gestures from the TSL detection dataset, which includes 5469 grayscale images across 31 gesture classes. The models underwent training with uniform data augmentation protocols and were assessed using object detection metrics including precision, recall, and mean average precision computed at an IoU threshold of 0.50 as well as over the interval from 0.50 to 0.95. The evaluation of computational efficiency involved metrics such as how fast the model infers, its frame rate, size and the total training duration. YOLOv9t exhibited the most robust detection accuracy across all evaluated metrics, achieving the highest mean mAP at 0.50 (0.990), mAP at 0.50 to 0.95 (0.876), precision (0.975), and recall (0.966). In contrast, YOLOv10n achieved the lowest inference latency (0.7 ms). These findings highlight the trade-off between accuracy and efficiency in gesture detection and show that YOLOv9t and YOLOv10n represent strong choices for accuracy and latency-critical applications, respectively.
Keywords
Related papers
Artificial intelligence: a modern approach
1995
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, P Lenz, R. Urtasun
2012
Probabilistic graphical models : principles and techniques
Daniel L. Koller, Nir Friedman
2009
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
Yin Zhou, Oncel Tuzel
2018