So in our introduction to multiple linear regression, we said that for we have P variables. And so we have P beta coefficients plus our original beta zero which doesn't have an X attached to it. Right. And then we have our errors from on the end, that's the standard multiple linear regression. Now the relationship between each X. I. And Y is an important concept to get introduced to. Now if that X. I doesn't have any noticeable effect on why then the beta term will be zero, meaning that as we increase or decrease that X. It'll have no effect on our target. Why? Right. So essentially if a beta has about a zero or something near a zero as its value, then the effect of X. That specific X on Y. Is very small. To the point that it might not be adding anything to our model. It might actually be making our model worse. And so we wanted we wanted to start to explore and we have something like sales equals, you know the T. V. The radio, the newspaper budget. If are increasing newspaper and decreasing newspaper does nothing to our sales no matter how much we increase or decrease it. If that doesn't affect sales at all, then why is newspaper even in our model in the first place? So the first thing we want to do when we have a standard multiple linear regression with a bunch of variables? Right? We have p variables. We want to start asking ourselves or any of this even make sense. Does this model make sense? Does the relationship between any of these Xs and that's Y make any sense or are they all zero? Right. If I have a bunch of variables like height, weight BMI. Which is just whatever function of those two, I guess. I color hair color as my X variables. And then my Y variable is weather something like that? Like something ridiculously unrelated to any of your Xs, then all of my betas will be zero. It doesn't matter how tall a person is that I have my data set, how much they weigh whether eye color is the weather outside is not going to change at all. Alright, so when I put this model of physical features and weather, when I run the data on this, it's going to say, well, all these betas are zero, no matter what you do to these physical features, they don't have an actual relationship to the weather outside. Right. So, we want to know the relationship between X I and Y. We want to know first of all, are any of these beta coefficients actually useful? Right. Are they non zero. Do they actually have a meaningful effect on our Y variable? So we can do a hypothesis test right here. No hypothesis and alternative hypothesis. Again, something that you probably should have seen the statistics of course, at some point, this isn't really this whole setup of a hypothesis testing. And again, next time we're going to do confidence intervals it's not really make or break. If you don't have the previous experience with hypothesis testing and confidence intervals, it's okay. This is still something that you can move forward with. We're not going to be testing on this or doing anything intense with this is more so the the set up of it. So in general, if you want to know the relationship between X I and Y we want to test, well, are any of these betas useful? Right? Or are they all zero? If they're all zero, they're not really useful and we're in some trouble. But if the alternative hypothesis turns out that some of these betas or at least one beta is non zero. So therefore useful to us. Whatever this, let's say it's only one whatever that one beta is the X. That's attached to that beta is useful in determining a relationship to the target. Y. And so if we want to do this hypothesis test as we know from other statistics classes or at least a foundational one, we will be using an F statistic. But this F statistic would just tell us that at least one beta is not zero. And so now we know something, this model is somewhat useful, there's some relationship, it's very vague but at least it's pointing us in the right direction. Next, we would ask ourselves which betas are actually important. This statistic tells us that at least one of them is important. Which one is important or which ones? If it's more than one it doesn't give us any information. It just says that at least one is useful. So now we need to dig into. Is it one is it more than one? Which ones are useful? Now there's three methods at the beginning level. There are more advanced methods to do this later. But right now these are three great methods. It's called forward selection, backward selection and mixed selection. What happens in forward selection is you start with a null model and you fit P simple linear regressions. So you fit Y equals beta one X 1 one plus beta zero. In a baseline term is always going to be in their simple linear regression. So you'll have Y equals beta zero plus beta one X one. You'll also have one equal beta zero Plus beta two X two plus there. And you do this to each one of the P variables that you are dealing with. So you start with nothing. You fit P separate simple linear regression, whichever out of the Ps simple linear regression that you did whichever one has the lowest RSS which we spoken about many times. Right? So we know what RSS is whichever of these P simple linear regressions. Is P models which everyone has the lowest RSS. We add that corresponding variable to our set model. So let's say this model had the lowest RSS, we would take this variable X two and we would add it to a Y equals beta zero plus beta two X two. Right? We would do it again. And maybe we find that once this one is out right, It's already the best model. We've put it in the best of the best model. So we put that variable one next time. Maybe beta four is the lowest RSS model. So we add that to the model. Beta four X four [INAUDIBLE] and we just repeat this until some criteria or usually it's called some condition is satisfied. So you might have, well, we're going to do this until we have five variables or we're going to do this until some other condition is met. Okay, that's forward selection. Backward selection works in the opposite way. We start with all of it. So if we have P variables, we start with our full model, write Y equals beta zero plus beta one. X one plus. All up to to beta P XP we start with this full model here and we remove the least significant variable. Okay, so we find that, out of all these we get our, beta estimates and all this stuff. We get a cool model, we find that excuse me, maybe this one is the least significant. So we take it out. Okay then we repeat this and we find that maybe one of the beta Xs and here is the least significant. We take it out another one, we take it out, we're left with you know, our original model minus. We're slashing out some of these terms that are least significant in a row and we just repeat this until some stop condition is satisfied for this one. Okay, so that's backward selection. Mixed selection is kind of as the name implies, a mix between forward and backwards. So we start out by doing forward selection. So we do forward selection first we have some stop condition or some general condition. That forward selection kind of, we get a good model. Then we are done with our forward selection part. Then we go through our model that we got from our fully fledged out and done forward selection and we start slashing or removing parts of it with large P values. So variables with large p values. We start slashing out of our forward selection. So we started with nothing. We added a bunch right? We do forward selection to add a bunch. Then we have a model at this point we stop our forward selection. We start slashing the bad parts. We have a slightly smaller model and then we do this again right? We do another forward selection starts slashing forward selection started slashing the general stop condition is that it's important because mixed selection is actually a pretty cool concept, which is that the general stop condition for mixed. Is that all the variables in your model have a small P. Value. And there's a second stock condition. The second stock condition is not only do all the variables in your model have a small P value, but if I were to add any variables not currently in my model, they would have a large P. Value. Once both of those conditions are set and done, then you're mixed. Selection is done again, we do forward, then we slash a few forward slash forward slash until we think we're at the stop condition. The stop condition is every variable in my model has a small P value. And if I were to add some of the ones that I've slashed out of the model, they would have a large P value. Okay, so mixed is pretty useful. I like mixed quite a bit. Maybe I'm a little biased