Welcome to the ImageNet Bundle of Deep Learning for Computer Vision with Python, the final volume in the series. This volume is meant to be the most advanced in terms of content, covering techniques that will enable you to reproduce results of state-of-the-art publications, papers, and talks. To help keep this work organized, I've structured the ImageNet Bundle in two parts.
In the first part, we'll explore the ImageNet dataset in detail and learn how to train state-of-the art deep networks including AlexNet, VGGNet, GoogLeNet, ResNet, and SqueezeNet from scratch, obtaining as similar accuracies as possible as their respective original works. In order to accomplish this goal, we’ll need to call on all of our skills from the Starter Bundle and Practitioner Bundle.
The second part of this book focuses on case studies – real-world applications of applying deep learning and computer vision to solve a particular problem. We'll first start off by training a CNN from scratch to recognition emotions/facial expressions of people in real-time video streams. From there we’ll use transfer learning via feature extraction to automatically detect and correct image orientation. A second case study on transfer learning (this time via fine-tuning) will enable us to recognize over 164 vehicle makes and models in images. A model such as this one could enable you to create an “intelligent” highway billboard system that displays targeted information or advertising to the driver based on what type of vehicle they are driving. Our final case study will demonstrate how to train a CNN to correctly predict the age and gender of a person in a photo.