Have you ever wondered about the history of vision transformers?

We just published a course on the freeCodeCamp.org YouTube channel that is a conceptual and architectural journey through deep learning vision models, tracing the evolution from LeNet and AlexNet to ResNet, EfficientNet, and Vision Transformers. Mohammed Al Abrah created this course.

The course explains the design philosophies behind skip connections, bottlenecks, identity preservation, depth/width trade-offs, and attention. Each chapter combines clear visuals, historical context, and side-by-side comparisons to reveal why architectures look the way they do and how they process information.

Here are the sections covered in this course:

  • Welcoming and Introduction

  • What We'll Cover Broadly

  • LeNet Architecture Model

  • AlexNet Architecture Model

  • VGG Architecture Model

  • GoogLeNet / Inception Architecture Model

  • Highway Networks Architecture Model

  • Pathways of Information Preservation

  • ResNet Architecture Model

  • Wide ResNet Architecture Model

  • DenseNet Architecture Model

  • Xception

  • MobileNets

  • EfficientNets

  • Vision Transformers and The Ending

Watch the full course on the freeCodeCamp.org YouTube channel (5-hour watch).