In this session, we're going to cover a couple of topics. One main idea is, if you don't do R-squared, if you don't do statistics but we just have some data, we keep it aside for test, how can we use that data to improve our models? That's the one part of it. The second part is, how do we do regression when the response variable is not linear, but it's a 0-1 response or red-blue-green kind of response? We're going to do these two big ideas in this session. So here, I'll break this up into four parts. The first part is I think it is just splitting hairs, but more about is there a difference between making prediction versus explaining a phenomenon? That's the one thing we want to do, and then we want to use the idea of a hold-out sample in this context. The second thing we want to do is, how do we use regression? If you want to predict, how will we use regression? How will we use the hold-out sample? How will we measure performance with the hold-out sample? Way quickly, we will pass through one other idea which is, how do we improve our model by deciding which variables to keep on regression, which variables not to keep in our regression, why is that an important thing especially with big data? The last thing would be, how do we extend our models to handle binary variables? So we'll be talking about a newly type of regression called logistics regression. So as you've seen already in the first two modules, we said one of our goals is to explain things, how these response are related to the predictor variables. This is the kind of regression you probably have done a lot already, and then we know how to fit the data using rattle and R, and then we know how the examine it using statistical methods called R-squared, visual tests, P values, and stuff like that. Then we come to prediction. The goal is slightly different, right? So what we are saying is, we have a new set of values for which we don't know what the response variable is, but we want to use a predictor variable to predict what the response will be and see how good our predictions are. Basically, we'll be developing the model on the training data, and we'll be assessing the performance on the hold-out data. The first time in this course, you're going to look at why we split data into a training part and a validation part.