Hi there, I'm Lak and I lead the team that is putting together this course and this specialization. Welcome to Going Deeper and Faster. This is the fifth module in the third course of the advanced machine learning on GCP specialization, and we're talking about images. Neural networks aren't new and deep neural networks aren't new either. As we discussed in the courses on launching to machine learning and on the art and science of ML in the first specialization, a variety of small innovations make deep neural networks possible. The discovery of convolutional neural networks in particular, allowed for practical high accuracy models on images. People quickly discovered that the more layers an image model has, the better it performed but only up to a point. In this module, we'll focus on the problems that deep learning researchers encountered that prevented them from training deeper better-performing neural networks and how those problems are mitigated, how their effect is reduced. In this module, you will learn how to train deeper, more accurate networks and to do such training faster. You will learn about common problems that arise when training deeper networks and how researchers have been able to address these issues. The first of these problems is called internal covariate shift. A technique called batch normalization helps address it. You will learn how to implement batch normalization in deep neural networks. The next big advance was in adding shortcut connections and repeated structures to neural networks. So, you will learn how to do this. As networks get deeper, training them takes longer and longer. You will learn how to train deep networks on tensor processing units, on TPUs to do this faster. You will learn how to write a custom estimator for TPUs. Finally, you will learn how to automate network design using neural architecture search. CNNs were introduced in the early 1900s and they proved quite effective on handwriting recognition. But what really jump-started the deep neural network revolution was AlexNet, in 2012. AlexNet was a neural network with eight layers; three convolutional, two max pool and two fully connected and one softmax. It proved to be very effective even on more complex image classification tasks. When you take the top five error rate, which is a common benchmark for object recognition and computer vision, AlexNet reduced the top five error rate on a dataset called ImageNet in a competition from 26 percent to 15 percent. AlexNet was fundamentally different. Whereas previous contestants in this competition they had used traditional image processing and machine learning techniques, AlexNet proved that deep neural networks could not only compete but they could actually win. So, several factors were responsible for the revival in CNNs and we've talked about this earlier. First, the availability of a much larger training set, and this is where ImageNet comes in with millions of labeled examples. Number two, hardware accelerators in the form of GPUs that made the training of larger models practical. Three, tricks such as dropout to add better model regularization. So, why do CNNs work so well? One idea to understand why they work is to project the layer activations back to the input pixel space and show the top activations, which pixels get activated by which neurons. So, this graphic by my colleague Chris Olah illustrates what a network might learn in each layer. The initial layers, they consist of primitives like colors and edges. So, the neuron has a small receptive field. So, what you're seeing here in the visualization it's actually a tiling. So, what the neuron sees as one black dot surrounded by a light yellow or a lightweight. The next layers combine these black dots with yellows and whites in a hierarchical manner to start to identify corners and curves and textures. So, these textures now are more complex. They're not as local as they were in the previous layer because their receptive fields from the previous layer are being combined, but ultimately these things are still tied to which part of the image we are talking about. Later layers put these building blocks together to start to identify a recognizable aspects of the category being classified. For example, when you look at the features for the dog images, you see that discriminative features like eyes and fur are being emphasized. So, the eyes started off the first layer as being small dots surrounded by yellows, they got combined into a larger textures, and now in the final layer, you're basically seeing eyes and fur, things that are very discriminative of a dog. So, what this visualization technique shows is what parts of images individual filters responded to. This visualization proves in some way that CNNs what they learn is a hierarchy of features. So, see the link that's shown in this video for more analysis of what the later neural network layers do and what things they pick up on. Based on the visualization, the New York University of researchers were able to improve upon AlexNet, specifically by using a smaller receptive window and a smaller stride and they created a network called ZF Net which won the ImageNet competition in 2013. 2014 rolled around VGGNet from a pair of researchers at Oxford and GoogLeNet from an outfit based in California, showed that deeper models yielded higher accuracy. VGGNet had 19 layers and GoogLeNet had 22 layers, almost three times as many layers as AlexNet. But why limited to 19 layers at 22 layers? Why not 150 layers? More is better, right? Unfortunately, when researchers tried to train deeper networks, they found that deeper networks take a very long time to train and they are very sensitive to hyperparameters, so research shifted to finding methods that would allow for robust training of really deep neural networks. Of course, as the number of layers goes up, the number of weights that need to be optimized also increases quite dramatically. The size of the dataset also increases dramatically. So, more layers, more weights, bigger datasets. We need to find methods that allow us to train really deep networks and really large datasets much much faster. The problems that occur in deep networks and the ways to get robust fast training despite these problems is what this module is about.