首页 /研究 /EgoVision a YOLO-ViT hybrid for robust egocentric object recognition
PERCEPTION

EgoVision a YOLO-ViT hybrid for robust egocentric object recognition

Umm e Sadima, Yazeed Alkharijah, Danish Hamid, Muhammad Ehatisham Ul Haq, Syed Muhammad Usman, Shehzad Khalid, Mohamad A. Alawad

发表年份
2025
引用次数
1
访问权限
开放获取

摘要

The rapid advancement of egocentric vision has opened new frontiers in computer vision, particularly in assistive technologies, augmented reality, and human-computer interaction. Despite its potential, object recognition from first-person perspectives remains challenging due to factors such as occlusion, motion blur, and frequent viewpoint changes. This paper introduces EgoVision, a novel and lightweight hybrid deep learning framework that fuses the spatial precision of YOLOv8 with the global contextual reasoning of Vision Transformers (ViT). This research presents EgoVision, a whole new hybrid framework combining YOLOv8 with Vision Transformers for object classification in static egocentric frames. The static images come from the HOI4D dataset. To the best of our knowledge, this is the first time that a fused architecture is applied for static object recognition on HOI4D, specifically for real-time use in robotics and augmented reality applications. The framework employs a key-frame extraction strategy and a feature pyramid network to efficiently handle multiscale spatial-temporal features, significantly reducing computational overhead for real-time applications. Extensive experiments demonstrate that EgoVision outperforms existing models across multiple metrics, achieving up to 99% accuracy on complex object classes such as 'Kettle' and 'Chair', while maintaining high efficiency for deployment on wearable and edge devices. The results establish EgoVision as a robust foundation for next-generation egocentric AI systems.

关键词

Cognitive neuroscience of visual object recognitionSoftware deploymentRoboticsTransformerWearable computerDeep learningObject (grammar)Feature extractionAugmented reality

相关论文

查看 PERCEPTION 分类全部论文