Hi there, and welcome to dealing with data scarcity. My name is Max and I'm a tactical curriculum developer at Google Cloud. In this module, we'll focus on data scarcity, what it is and why it's important, before moving on to what you need to do about it. In this module, you'll learn how to: calculate the number of parameters for ML models and understand why our data needs grow with the number of parameters, understand how to add data augmentation to a model and the impact that it has, apply your understanding of CNNs to the concepts of task dependence and understand how transfer learning decreases the need for data. A common problem in machine learning is that you don't have enough labeled data to train your model. Although mobile devices, IoT sensors, and other devices outfitted with cameras have made data much more freely available, at the same time, models have also grown more complex, and as a result our data needs are also greater. Let's think about the fact that model training consists of initializing our parameters at more or less random values and then estimating the best value for each one. Generally speaking, the more parameters we have, the more data we need. So, let's consider the number of parameters in the models we've seen so far. The code for our linear model looks like this. Our linear model has a weights variable and a bias term and its output is the product of our input with our weights plus the biases. How many parameters are in our linear model? The correct answer is height times width times nclasses, plus nclasses. A weights variable w has height times width times nclasses elements. For perspective, on our MNIST task, this comes out to 7,840 parameters. Adding a and b are biased term which has nclasses elements and the total becomes 7,850 parameters. The DNN model consists of three fully connected layers with sizes that decrease as we progress through the network. Then, a final layer that linearly combines the nodes in the second to last layer into nclasses different nodes. How many parameters are in the h1 layer? The correct answer is height times width times 300, plus 300. Keep in mind, we're adding weights between every one of our height times width inputs and our 300 neurons. Additionally, each neuron also has its own bias term. If you did the math across the entire network and added all that up, it comes to about 270,000 parameters or over 30 times the number in our linear model. As we mentioned, one of the benefits of using a CNN, is that it requires fewer parameters than a comparably performing DNN. How many does it actually use? Let's take a deeper look. Here's a CNN model, we have two convolutional layers each followed by a pulling layer to further reduce the size of the network. Then finally, a dense layer with 300 neurons before the classification layer. How many parameters are in the C1 layer? The first two answers should strike you as odd, because they suggest that the number of parameters in a given layer of a CNN grows with the size of the input. But think back to how CNN works, we convolve kernels over our input. The kernels are our parameters, where you expect to see a height and width terms is the output. In this case, the correct answer is 10 times nine, plus 10. Where 10 is the number of filters and nine is the volume of each filter which is three, times three, times one. This calculation was done for our MNIST example, but let's see how this approach generalizes to inputs with more than one channel. To compute the number of parameters in a convolutional layer, first, compute the number of parameters per filter and then, multiply by the number of filters. The number of parameters per filter is equal to the height of the kernel, times the width of the kernel, times the depth of the kernel. The depth of the kernel is equal to the depth of the layer that comes before it. Then, multiply the number of parameters per kernel by the number of filters. Finally, add bias terms for each filter. Ultimately, if you compute the number of parameters for the entire network, you'll see the total is about 300,000. You might think, didn't he say that CNNs have fewer parameters than DNNs? I did say that. The number 300,000 is mostly a function of the size, of the fully connected layers after our convolutional layers. About 90 percent of all the parameters in this model, come from this one layer. If you compare the number of parameters in the convolutional portion against the fully connected portion, the latter has many times more parameters. So far we've looked at two examples. Real-world models have significantly more parameters. Here are some of the most high-performing image models of the last few years and the number of parameters in each. We will be going over ResNet and GoogleNet in greater detail, in a later module. Tens of millions of parameters is a lot. What do you do when the number of labeled data points you have, is a lot less than the of number of model parameters you want to fit? Well, if you have some labeled data already, there are two common methods of attacking the problem of data scarcity. The first method which is called data augmentation, addresses the problem by creating more data. The second method which is called transfer learning, addresses the problem by making the model more data efficient reducing its need for data.