Okay. So here's our data, ice-cream data. We've got quantity, price, temperature number of conventions, and years. I want to create what's called a set of dummy variables. Dummy variables are dichotomous variables take on a value of one in some situations, and zero in other situations. So if I was doing a wage analysis and I had observations for males versus females, and I want to see if there was a wage difference between males and females, holding things like education, experience constant, I can create a dummy variable for male, dichotomous variable and say, okay, he observations were for male, that dummy variable takes on a value of one, for observations for a female, that dummy variable, that dichotomous variable takes on a value of zero. Then I can evaluate that coefficient as this is how much more or less males are relative to females. Rather than treating it as a continuous variable, I'm treating it as a state. So here's some males here's some females. In this case, we have three different years. Now, what I'm going to do, is I'm going to insert some columns here. I'm going to create three sets of dummy variables. One called 2001, one called 2002, and one called 2003. I'm doing this because, I want to know like how does my consumption in 2001 and 2002 relate to 2003 or vice versa. Now, in order to establish these dummy variables, I'm going to set up a series of if-then statements. So if you're not too familiar with Excel, I'm essentially saying, hey Excel. Suppose that, this little thing over here, is actually equal to 2001, for this case. I want you to give it a value of one rather than a zero. If it was 2001, if it is equal to 2001, I want to give you a value of zero. I hit "Return" on this and I click this box that allows me to go down. You'll notice then, this little if-then statement. So if this guy is equal to 2001 it's a one, else is zero, gives me a one. For instance, 2001 it gives me a one for all the years of 2001. But for years the 2002 and 2003, it gives me zeros. Now I've got a dummy variable that's 2001. So for 2001 it's a one, every other year it's a zero. I'm going to do the same thing for 2002. If this guy over here is equal to 2002, give that a one otherwise give it a zero. Now, you'll notice my first observation here is a zero, because it's 2001 it's not 2002. But if I copy that down, it takes a one for every year that is 2002. I'm going to do the same thing for 2003. Let's say equals F. This guy over here is equal to 2003. Give it a one otherwise give it a zero. Then I'm going to click this down, and you'll notice that this takes on a value of 2000, takes on a value of one for all the years 2003. So now, I'm going to run this regression using dummy variables. Now, I'm first going to run it wrong to show you this thing called a dummy variable trap. Then I'm going to run it correct get rid of the dummy variable trap. Here's what I mean. Dummy variable trap is a situation where I'm trying to predict all three states in this case. If it was male and female, I would try to include a 1, 0 for male, and also a 1, 0 for female. Now of course, when I'm running a regression, the computer doesn't like it if I have too many variables that are highly correlated with each other. What happens is that, it can't determine what to do. If I have let's say two states male and female and I try to include a dummy variable 1, 0 for both males and females, the females perfectly correlated with the male. So whenever you're not a male you are a female or vice versa. In this case, my 2001 is perfectly correlated with 2002 and 2003. Whenever it is in 2002 and 2003, it must be 2001 or 2003 is probably correlated with 2001 and 2002. Whenever it's 2003, it is in 2001 and 2002. Computer doesn't like that. But I'll show you. If I try to run all of them, the computer will actually spit one of my dummy variables out. Right here. The one that it spit out was 2001. So what's going on in this model? It says look, any given Sunday here, I'm going to consume something like 384 ice-cream cones minus a 120 cones for every unit of price increase, plus 1.78 cones for every increase in temperature, plus half a cone, for every convention. So then here, for 2001 is zero, then I've got 2002 and 2003. So let's interpret this output because this Excel wants to give you good output. So let's leave this 2001 out of there. Let's just have the 2002. So it's like plus 45 ice-cream cones for 2002, minus 17 ice-cream cones for 2003. We know how to interpret most of this stuff. I'm going to do this one more time I apologize. So plus 1.78 times temperature plus 0.5 time a convention plus 0.45, times 2002 year minus 17 times 2003 year. Okay. So how do I interpret this 45 and negative 17. I know how to interpret most of this other stuff, right? Negative 120, price goes up by one unit. I'm going to draw my consumption by a negative 120 units. I'm going to drop my consumption by 120 units. Temperature goes up and I'm going to increase my consumption by 1.78 units. Conventions go up, I'm going to increase my consumption by half a unit or people increase the consumption by half a unit. But this dummy variable 2001, 2002, 2003. So as 2002 goes up, I increase it by 45 units or as 2003 goes up, I decrease it by 17 units. Not exactly. For dummy variables, we really approach it as if one of the states that we're observing and what is the state that we're not observing. In this case, we have three years; 2001, 2002, and 2003. We're leaving 2001 out or the computer chosen really. So what's happening here is that 2002 and 2003 are going to be compared to the missing dummy variable. In this case, 2001 becomes our benchmark. So then we interpret 2002 as what's happening relative to our benchmark. So this says, 45. So the way to interpret this is, we're going to consume 45 more ice cream cones in 2002 relative to 2001. In 2003, we're saying, we're going to consume about 17 less ice cream cones than we did in 2001. So the way I'm interpreting these is what is this state and how is it different than the benchmark state. Three states, 2001, 2002, 2003. The coefficient on the states that are in the model tell you how those being in that place 2002 or 2003 is different than the benchmark year, the year that's missing. Okay. I'm going to run this model one more time, but now I'm going to include 2001 and 2002, instead of 2003. Now, I'm going to make some predictions based on this information and see if this is correct. You'll notice that in this model, 2003 is present and 2001 is absent. So in 2003, we consumed 17 less cones than we did in 2001. So in 2001, if 2003 is missing is our benchmark. We should consume 17 more cones in 2001 than we did in 2003. Now, for 2002 you'll notice that it's 45 more cones than it was in 2001. So if 2003 is 17 less than 2001, and 2001 should be 17 more than 2003, and 2002 is 45 more cones than 2001. When I benchmark 2003, I should get something like 62 more cones than 2003 since 2002 is 45 more than 2001, and 2001 is 17 more than 2003, and 2002 should be about 62 cones more than 2003. Let's see what we get. So I'm going to run my regression here. But instead of using all of these variables, I am now going to use price temperature convention 2001 and 2003 as my output. My label column is still clicked. I'm going to hit go, and here we go. Here's my output. Now, I want you to notice something between these two sheets. My intercept. 367.91. Let's go back to sheet seven. It's very close, 384,367, right? It's a little bit different because we're taking a little bit different view of what's going on. Look at my price. My price negative, 120.22. Really is the same analysis. All I've done is switched my perspective of what's going on. Okay. Here, look at my temperature, 1.78. Again, 1.78. Number of conventions here is 0.50. Here it's 0.50; does exactly the same. Notice here, 2003 is 17 less than 2001. In this case then, 2001 should be 17 more than 2003. In fact, it is. In this case, 2002 is 45 more than 2001. If it's 45 more than 2001, and 2001 is 17 more than 2003, then 2002 should be 62 more than 2003, and in fact it is. So understanding the nature of dummy variables, understanding the nature of the missing state, the benchmark, gives you more ammunition to interpret dummy variables, to use dummy variables and to understand how you can use states; male, female, different years, different months, maybe different regions of a country; things of that nature in your regressions, to understand if there are differences between different points of being. So let's review really quickly multiple regressions, more than one variable; interpreting these things. We're using this thing called the Ceteris paribus condition. I mentioned this a couple times. It's Latin for all things being held constant. The multiple regression allows us to hold more things constant and also gives us a much better picture about what's going on. The coefficients of these things are interpreted very much the same way as a simple regression. As our independent variable is changing, the coefficient tells us the relationship between the independent variable and the dependent variable. In this case, price goes up, consumption goes down by 120. Temperature goes up, consumption goes up by 1.71. Conventions go up, consumption goes up by half a unit. Two thousand and one here, here we have 17 more cones consumed in 2003, 2002 has 62 more cones consumed relative to 2003. The t statistics tell us whether or not there's some statistical significance in what the coefficients are saying. Is it believable? Is it different than zero? T stat in this case for large number of observations that's greater than about 1.96 in absolute terms says, "Hey, we can't believe that this coefficient really is different than zero. I wanted you to try and replicate this stuff, download again these data, try running a single regression and try running multiple regressions and see if you can get the same output that we showed here.