So what I'm going to do, is I'm going to start Studio. So when I start Studio, it has four boxes on it. If your box on the left scanner is empty, don't worry. One thing I would like you to do if possible, when you run this, is to use this broomstick you have there, and clean all the data. So it just start to the clean sketch. So it says the Environment is empty. Now, I will close all these files I have open, just to probably show you that fourth quadrant of the software does disappear. This time we're going to use a file called a script file. My script file is stored in this directory of mine, which is deep into my computer and it opens up. So once again if you look at it, you got four Windows, but this time on your top left window you have some commands that are done in R. Now, these are commands that are done in R for you already, and this should be best subset regression. Because all you need to know is this is called a script. What is a script? A script is, I have it on the commands and I have stored them for you, and you are now going to run the commands you have saved. You don't have to know what the commands do, but eventually, you start reading them and you start understanding. So this is learning by doing. So let's start. So I'm going to step through these commands and it's easy for you to follow, the [inaudible] step through these commands is there is a run icon out here, every time you hit it, it's going to run the current line. If you hit this, it'll run the entire script. So let's try this. So first step, it installs a package called 'leaps.' Next step, it installs a package called 'caTools.' Then, installing a package is not enough. What it does is the package is put on your machine, and then do actually use it. You're going to use the library command like you did for the local. You have to say library leaps. It invokes in, its now available for you to use, and then you say library 'caTools'. So at this stage, we have installed all the libraries we need for doing what we want to do. Next command, gets the working directory of your Rstudio. So it says C users 20075 documents. That is the work directory where all your files are. It is possible that the Toyota file may not be there. So you have two options. You can copy the Toyota file which I'll do in a second. The Toyota file for me is here, the CSV file. I copy this and paste it in my documents, there you go. I've pasted it, I've replaced the file, I've done that. Fine. So I pasted it in my directory. So your working directory may be different. So what do you have to do is get the CSV file and put it in that working directory, and next is a read command. So when I run it, if you get an error here, 'Toyota file has not been copied to the right directory'. So if you get an error here, just go back and get your CSV file and download it, and put it in your working directory. The next command helps you split the dataset in the ratio of 80 and 20. The command is self-explanatory. Here's the dataset and this is the split ratio, fine. The only thing which is a little confusing is what is the sample? Can we run it? As you notice, it creates this variable called sample. It's actually called a logical variable. It takes the value true or false. On 80 percent of the rows of this data set is going to take the value true, and 20 percent it's going to take the value false. So when I run the next command, which is train equal to subset of Toyota comma sample equal to true. You will notice, when entered it creates the train dataset which has 1149 observations, which is 80 percent of 1436. How did it do it? Valid found the subset of Toyota, where they run the logical variable, took the value true. When you run the next command, it does the same thing, and it creates a subset for testing which has got 20 percent of the data of 287 observations it's created as you can see. What sample does, is actually labeled each of the values as sample or not. Those which are true, it puts it in a dataset called Train, and those which are not true, it puts in a dataset called test. So sample is a logical variable, which is simply whether it is a train or a test. So if it is true, it is training dataset. If it is test, its testing dataset. So the next step is to run the model. So if you notice, it has run the model and it says Max nine. That means at maximum nine variables you can use the model, and it says the method to use is forward. There are three methods you can use, as I said forward, you can use backward, or you can use exhaustive. So if you just change the label out there, you can use one of the three methods we discussed about. So it has already run the model, and when you hit the next command, it gives you this output. So what does this output show? In each of these, so this is each of the model it is fitted. So it's fitted totally in nine models. In the nineth model all the variables are there, as you can see, except Doors. In the first model, it does include Age. The second model, you've got Age and Weight and so forth. So each one explains more and more of the data as you can see. So summary models, gives you the summary of all the models you have done so far. Next, I need to select one of the models and we're going to use one of the criteria. The criteria I'm going to use is on the adjusted R-square. So what I'm going to do is take these variables and create a list of values called residual sum. The residual sum is basically giving you all the data you know about the residuals as you can see. Right? So it gives you the adjusted R-squared and gives you the mallow cp, it gives you the information criterion. Let's go back. The last command, and I'm again explaining here, it asks which is the best model? So we're seeing the best model is the one which has the highest adjusted R-square. Now, do you need to know these commands? No, somebody technically did them for you. What we have just seen is what is known as R-script. You may like to step through it yourself several times, seeing how it reads the data, seeing how as it process, it keeps adding variables to your right-hand side on the top. Finally, it also gives you the output.