So, how do you get out of sample data? You may have sales data now to train your algorithm from past years, but you may have to wait another year in order to get an whole new batch of out-of-sample data or maybe it's a month, or six months, or less. The point here is you have to wait some period of time to wait for some more data to come in and test. Well, what if you don't want to wait that? Or you've got an delivery to your Vice President, you've committed to, we're going to have this algorithm up and working by such and such a date, we can't wait 30 days, and we can't wait a year, right? Well, there are ways to address that. So, one approaches to divide your in-sample data into two [inaudible] use for training, and one use for testing. I drew quickie little picture over there on the board during the machine learning section. So, example, for the sales data case, you could use a certain date as a dividing line between in-sample data and out-of-sample data. There's a much better way than drug-taking and taking a set of data that was sampled over time, and chopping in at some date, and saying, "All right, all this data before this date is my training data and my in-sample data, and all the data after that date, that point in time after that date is going to be my Out-Of-Sample-Data." We'll get to that in a second, but I want to talk about learning curves here for a second. So, a learning curve is a plot of the error of a learning algorithm with respect to the quantity of data used for training. It helps to visualize whether an algorithm is suffering from bias or actually I should say, can help visualize whether an algorithm is suffering from bias or variance problems. So, you divide your dataset into in-sample and out-of-sample set say, 70/30 or 90/10 for example. Okay? We will see that it is a better way to do this come on up here. So, you create sub-portions of the in-sample and out-of-sample data, and say 10 percent increments, and you train the model on each of the in-sample data sub-portions, and then you run the out-of-sample data sub-portions, you do both of those and you record and plot the error for each of those sub portions for both in-sample and out-of-sample data. Ideally, they become close to a common error value would be the goal. So, here's an example of a learning curve. Since the amount of error here on the Y, and we had some convergence in mind that we wanted, that we set for our particular problem, in my linear regression, I had to decide what that error value was and I just learned it learned it through trial and error, trying different sizes for [inaudible] convergence. So, this red graph is the training graph, and the blue is the validation, the data used for validation and they don't converge. So, there's more work that needs to be done here to find out why these didn't converge because ideally, they cannot converge. So, this one here shows an issue with bias, error goal was one. I know you can't see this very well on here, but they both in-sample and out-of-sample data started differently, and it looks like they converge and they do converge, but they converge at an error value that's above the threshold that we wanted to be one at an error one or less, one was important for some reason. The one in an error less than [inaudible] us was too high. So, we might have bias issues. Just think about the election results, all those election results were all shifted off too far in one direction. Variance, these algorithms susceptibility to variance. So, we've got the in-sample data down here with this darker color, and point in this textbook was that for the out-of-sample data, the model starts to behave erratically with data that it's never seen before, and the x-axis is the number of training examples. So, you would expect as the volume of data goes up, these things should come together. So, this can be an indication that you have variance issues in your data. Be aware that your data can have bias and your data can have variance issues. Now, that is what I just said, it's too high, they suffer from issues, don't converge.