[MUSIC] Calibration data is a dataset in which you observe both the predictors, and the variable you are trying to predict. And one of the most common way to get that kind of information is to go back in the past. How does a bank predict the transfer default on a loan? Well, they will look at people who are dressed like you a year ago, or five years ago, and see how many of those defaulted. How did Target build a scoring model to predict customer pregnancy? Because some customers went to Target, willingly, and disclosed they were pregnant. They did so by creating a baby registry. And once Target had that information about which customers were pregnant, and when, and what they bought or did not buy, they could mine the database to find patterns that could have predicted what they already knew for a sample of customers. After building, and calibrating, the predictive model, they could apply it to everyone in the database, and assess everyone's probability of being pregnant. Now, let's forget about the pregnancy example for now, and focus on what we'll do with our datasets. In this course, you are going to predict how much money customers are going to spend over the next 12 month. And to do so, we are going to create a calibration data. We are going back time, 12 month ago, and extract from that data, two separate types of variables. The first type of variables are called predictors. These are the bits of information we had about each customer at the time, as if we went back in time and had no clue about what they did over the next 12 months. For instance, we compute what was their recency, frequency, and monetary value, for each customer, a year ago. Actually, we've done it already in the previous module, when we ran our segmentation model retrospectively. So we have that already. The second type of viable is what happened after that. It's the variable we try to predict, it's sometimes called the target variable. In our case, we'll try to predict whether a customer will remain active and make at least one purchase. And if so, how much money each customer is likely to spend. In other words, we predict both the probability and the likely amount. And because we'll build our calibration data by going back in time, we already know the answer. Because we observed it for each and every customer. Once we have built our calibration data, containing both the predictors and the target variable, the next step is to link the two through a statistical model. The model will predict customers probabilities of purchase as well as the most likely amount. And in the process, reveal the importance of each predictor. The importance of each predictor will be revealed by what we call weights and their statistical significance. If the weights are large and statistically significant, it means they are good predictors. If not, it means they contribute very little to the predictions. The best way to explain this, is to dig into our R code and see how it's done in practice. And that's exactly what we are going to do in a few minutes. But before we get there, let's remember something. What we are going to do is very typical in marketing applications. We are going to predict the probability that the customer will buy something. And if they do, predict how much money they'll spend. So, we are actually going to model two separate processes. The first one is to predict the probability. The second one is to predict an amount. To do so we will build two separate models that we will combine together later into one big scoring model. But the calibration data will not be exactly the same. To predict the probability of buying, we can use the entire of customers because we can observe for all of them, whether they purchase something or not. If you bought something, we'll code it as a one. If not, we'll code it as a zero. We can observe that behavior or lack of behavior for everyone and use it in our calibration data. For the amount spent, however, it's a different story. If a customer hasn't purchased anything over the last 12 month, we have no way of knowing how much she would have spent, had she made a purchase. So, for the first model to predict the probability of purchase, we'll use the entire database to calibrate our predictive model. For the second model, the one that predicts how much money you'll spend if you buy something, we will only include, in our calibration data, those customers who purchased something over the last 12 month, and ignore the others. How to do that will be explained in detail in the next video.