Comprehensive Performance Analysis of Object Detection Algorithms Across Diverse Sensor Modalities
Suvojit Acharjee, S. Ganguly, Asfak Ali, Jaroslav Frnda
- Year
- 2025
- Citations
- 1
Abstract
Object detection is a cornerstone of modern computer vision, driving advances in autonomous driving, robotics, surveillance, and smart infrastructure. However, detection performance and generalizability depend heavily on sensor modality and dataset characteristics. In this systematic review, conducted in accordance with PRISMA guidelines, we provide a comprehensive analysis of state-of-the-art object detection methods across diverse sensor types, including RGB cameras, LiDAR, radar, thermal, and depth sensors. We trace the evolution of benchmark datasets tailored to these modalities, examine how annotation strategies and sensor-specific features shape research directions, and highlight the growing adoption of multimodal and cross-modal datasets Our findings reveal that, although object detection has traditionally relied on simple RGB images, recent years have seen a growing shift toward multimodal and cross-modal datasets, with 15%–20% of algorithms now incorporating LiDAR 3D data and 10%–15% utilizing multimodal inputs. Furthermore, we analyze and compare the performance of state-of-the-art detection algorithms across six publicly available datasets. Our findings show that YOLOv3, for instance, achieves a recall of 98.7% and a precision of 84.77% on RGB frames from KITTI, though its performance declines on more diverse datasets such as JUVDsiv1. Voxel R-CNN yields 92% average precision on KITTI's easy difficulty level, 82% on moderate, and 80.04% on hard levels for the car class using LiDAR 3D data. Our study also reveals that multimodal methods consistently improve detection accuracy and generalization across challenging datasets, outperforming traditional single-modal approaches. In particular, multimodal sensor fusion enhances performance on complex benchmarks such as KITTI and nuScenes. Finally, for 2D image data, RSDet provides strong results on the FLIR and LLVIP multimodal datasets. These findings highlight the advantages of multimodal and cross-modal approaches and underscore the need for future research into sensor-agnostic frameworks, real-time efficiency, scalable deployment on resource-constrained edge devices, and data-efficient learning strategies for emerging sensing modalities. Supporting materials and extended analyses are available at: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/acharjeesuvo/Comprehensive-OD-Eval</uri>
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002