Home /Research /Benchmarking YOLOv8 to YOLOv13 for robust hand gesture recognition in human–robot interaction

HRI

Benchmarking YOLOv8 to YOLOv13 for robust hand gesture recognition in human–robot interaction

Yifang Gao, Wei Luo, Shunshun Zhang, Nur Syazreen Ahmad, Xiaojun Wang, Patrick Goh

Year: 2025
Citations: 6
Access: Open access

Abstract

Real-time and accurate hand gesture detection is essential for safe and intuitive Human-Robot Interaction (HRI), enabling robots to interpret non-verbal cues and respond appropriately in dynamic environments. This research evaluates the effectiveness of YOLOv8n through YOLOv13n models in recognizing static hand gestures from the TSL detection dataset, which includes 5469 grayscale images across 31 gesture classes. The models underwent training with uniform data augmentation protocols and were assessed using object detection metrics including precision, recall, and mean average precision computed at an IoU threshold of 0.50 as well as over the interval from 0.50 to 0.95. The evaluation of computational efficiency involved metrics such as how fast the model infers, its frame rate, size and the total training duration. YOLOv9t exhibited the most robust detection accuracy across all evaluated metrics, achieving the highest mean mAP at 0.50 (0.990), mAP at 0.50 to 0.95 (0.876), precision (0.975), and recall (0.966). In contrast, YOLOv10n achieved the lowest inference latency (0.7 ms). These findings highlight the trade-off between accuracy and efficiency in gesture detection and show that YOLOv9t and YOLOv10n represent strong choices for accuracy and latency-critical applications, respectively.

Keywords

GestureBenchmarkingInferencePattern recognition (psychology)RecallGesture recognitionObject detectionPrecision and recallFrame (networking)

Benchmarking YOLOv8 to YOLOv13 for robust hand gesture recognition in human–robot interaction

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

Probabilistic graphical models : principles and techniques

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection