首页 /研究 /Comprehensive Performance Analysis of Object Detection Algorithms Across Diverse Sensor Modalities
PERCEPTION

Comprehensive Performance Analysis of Object Detection Algorithms Across Diverse Sensor Modalities

Suvojit Acharjee, S. Ganguly, Asfak Ali, Jaroslav Frnda

发表年份
2025
引用次数
1

摘要

Object detection is a cornerstone of modern computer vision, driving advances in autonomous driving, robotics, surveillance, and smart infrastructure. However, detection performance and generalizability depend heavily on sensor modality and dataset characteristics. In this systematic review, conducted in accordance with PRISMA guidelines, we provide a comprehensive analysis of state-of-the-art object detection methods across diverse sensor types, including RGB cameras, LiDAR, radar, thermal, and depth sensors. We trace the evolution of benchmark datasets tailored to these modalities, examine how annotation strategies and sensor-specific features shape research directions, and highlight the growing adoption of multimodal and cross-modal datasets Our findings reveal that, although object detection has traditionally relied on simple RGB images, recent years have seen a growing shift toward multimodal and cross-modal datasets, with 15%–20% of algorithms now incorporating LiDAR 3D data and 10%–15% utilizing multimodal inputs. Furthermore, we analyze and compare the performance of state-of-the-art detection algorithms across six publicly available datasets. Our findings show that YOLOv3, for instance, achieves a recall of 98.7% and a precision of 84.77% on RGB frames from KITTI, though its performance declines on more diverse datasets such as JUVDsiv1. Voxel R-CNN yields 92% average precision on KITTI's easy difficulty level, 82% on moderate, and 80.04% on hard levels for the car class using LiDAR 3D data. Our study also reveals that multimodal methods consistently improve detection accuracy and generalization across challenging datasets, outperforming traditional single-modal approaches. In particular, multimodal sensor fusion enhances performance on complex benchmarks such as KITTI and nuScenes. Finally, for 2D image data, RSDet provides strong results on the FLIR and LLVIP multimodal datasets. These findings highlight the advantages of multimodal and cross-modal approaches and underscore the need for future research into sensor-agnostic frameworks, real-time efficiency, scalable deployment on resource-constrained edge devices, and data-efficient learning strategies for emerging sensing modalities. Supporting materials and extended analyses are available at: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/acharjeesuvo/Comprehensive-OD-Eval</uri>

关键词

ModalitiesComputer scienceAlgorithmArtificial intelligenceComputer visionSociology

相关论文

查看 PERCEPTION 分类全部论文