YOLOv4: Balancing Velocity with Vision for High-Performance Object Detection
Priyanka Kaushik
- Year
- 2025
- Citations
- 19
Abstract
Object detection remains one of the most challenging and impactful problems in computer vision, where the trade-off between speed and accuracy often limits practical deployment. While a vast number of architectural and training features have been proposed to enhance Convolutional Neural Network (CNN) performance, their effectiveness varies depending on datasets, tasks, and model architectures. Certain strategies, such as batch normalization and residual connections, have proven broadly beneficial, while others remain context-specific. In this work, we present YOLOv4: Balancing Velocity with Vision for High-Performance Object Detection, which systematically integrates and validates both universal and novel techniques to achieve state-of-the-art results. Key contributions include the incorporation of Weighted Residual Connections (WRC), Cross Stage Partial connections (CSP), Cross mini-Batch Normalization (CmBN), Self-Adversarial Training (SAT), Mish activation, Mosaic data augmentation, DropBlock regularization, and Complete IoU (CIoU) loss. These innovations are strategically combined to maximize robustness, generalization, and inference efficiency. Extensive experiments on the MS COCO dataset demonstrate that YOLOv4 achieves 43.5% AP (65.7% AP50) at a real-time speed of ~65 FPS on Tesla V100, outperforming existing object detection frameworks in both velocity and precision. Beyond benchmark performance, YOLOv4 provides a practical and scalable solution for real-time computer vision applications, spanning autonomous driving, surveillance, robotics, and edge computing. This work not only advances the state of the art but also establishes a reproducible and accessible framework, enabling researchers and practitioners to balance speed and accuracy effectively in real-world detection tasks.
Keywords
Related papers
Artificial intelligence: a modern approach
1995
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, P Lenz, R. Urtasun
2012
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martı́n Abadi, Ashish Agarwal, Paul Barham +17 more
2016