A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO’s evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO’s development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.

Related collections

Most cited references 87

Record: found
Abstract: not found
Conference Proceedings: not found

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren … (2020)

0 comments Cited 9407 times – based on 0 reviews

Bookmark

Record: found
Abstract: not found
Article: not found

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, Jia Deng, Jiang-hao Su … (2015)

0 comments Cited 2917 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Shaoqing Ren, Kaiming He, Ross Girshick … (2017)

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

0 comments Cited 2742 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Juan Terven: (View ORCID Profile)

Diana-Margarita Córdova-Esparza: (View ORCID Profile)

Julio-Alejandro Romero-González: (View ORCID Profile)

Journal

Journal ID (publisher-id): MLKEAZ

Title: Machine Learning and Knowledge Extraction

Abbreviated Title: MAKE

Publisher: MDPI AG

ISSN (Electronic): 2504-4990

Publication date Created: December 2023

Publication date (Electronic): November 20 2023

Volume: 5

Issue: 4

Pages: 1680-1716

Article

DOI: 10.3390/make5040083

SO-VID: 664e57a8-3d08-499f-bb7a-98734ae24e77

License:

https://creativecommons.org/licenses/by/4.0/

History

Data availability:

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.