Okay. So, so far we have run a simple regression. I want to run a more complex regression, what we call multiple regression. In our first case, with the ice cream data, we just looked at quantity as a function of price, but the truth is that there's a lot more things that drive quantity. Might be price, it may be the temperature outside if you're going for a walk, something of this nature. What I'm going do is, I'm going to run a multiple regression here on these data, and I'm going to say, quantity, let's suppose that quantity is a function of both price and the temperature that it is outside. Now, when I'm doing this I draw some hypotheses and I say, well, I understand microeconomics, I know that quantity is probably a function of price, it'll be inverse. So if price goes up, my quantity demand will go down, as price goes down my quantity demand will go up, ceteris paribus, means everything else is being held constant. That's a really easy hypothesis to test. My second hypothesis with respect to this data is, hey, I bet quantity of ice cream demanded is a function of temperature. So the hotter it gets, the more we want to eat ice cream. That's also a very simple thing that I can test. In this case, I'm going to run quantity as a function of both price and temperature. Now, when I'm running regressions what you should notice is that any of the variables that I want to run in my regression is for my x variables, have to be next to each other in this chart. So I'm going to go to my Data Analysis tab again, pull up regressions, and I am going to highlight in this y range, the information from my quantity of ice-cream here, but for my x range, my independent variables, I'm going to highlight both price and temperature. Again, labels is collect because there are labels on top of each one of these columns, the information is going to be put in a new worksheet, I click "OK" and I get this output. How do I interpret this? I'm going to write my little equation as I did before. Quantity is equal to 278.14 minus 82 times price plus 2.40 times temperature. How do I interpret this? What does this equation mean? This equation means, look, I'm going to consume, the average person in Central Park is about 278 ice-cream cones. For every additional increase in price, one dollar increase in price, my consumption of ice cream cones is going to go down by 82 units, but holding price constant, every additional one degree Fahrenheit in increase in temperature, the consumption of ice cream cones is going to go up by two units. So price has a negative relationship, ceteris paribus, temperature has a positive relationship, ceteris paribus. This technique here we call multiple regression. This technique is incredibly valuable because when you are trying to understand relationships between variables in your classes, drivers of demand or the relationship between the return and some kind of a policy, GDP growth and different factors. There's not going to be just one factor, there are going to be multiple factors that might influence some kind of outcome in your models or in your discussions, in your cases. In this case, I have reason to believe that both price and temperature impact the quantity demanded for ice cream. Pretty good reasons, we can see this actually happening. The analysis allows me to test both of these things at the same time. Now, you'll notice here in my example that both price and temperature are statistically significant. So the coefficient of negative 82.56, it has a t-stat of negative 3.97, my p-value is a lot smaller than 0.05, and so you can say look, negative 82 is statistically significant at 95 percent confidence and it's statistically different than zero. Same thing with this coefficient 2.40, it has a t-stat of 5.90 and the p-value is very, very small. It means it is believable, statistically believable that 2.4 is not zero, it's actually 2.4. This model allows me to maybe forecast what the consumption is going to be at a particular moment in time, a particular month, what have you, based on what I think the price is going to be and what I think the temperature is going to be. Truth is that we can run regressions with more than one variable, we can run two variables, we can run three variables and four variables and what have you. So now I'm going to show you a multiple regression using three variables and then four variables from these data, and show you some of the hiccups that you might have when you're inputting data that might have to be viewed slightly different. So let's let's look at our data one more time here. Here we have quantity of ice cream in price, temperature, number of conventions, and the year. So let's suppose that you think that the quantity of ice cream is driven by price, it is driven by temperature, it's driven by the number of people who are around Central Park, so the number of conventions should matter. You think that maybe there are fluctuations over time. So let's suppose that you say, just sort of like naively, let's just run a regression where we say my quantity of ice cream which is this right here, is a function of all of these things. Just highlight all of them and I click "OK." This is what you'd get. How would I interpret this? I would say the quantity of ice cream is equal to, oh my gosh, look at that large number, 38649 minus 44 times price plus 2.28 times temperature plus 0.76 times the number of conventions plus, or minus actually, 19.21 times the year. I say okay, how do I interpret this? Well, this is a little interesting interpretation. It says look, any given Sunday maybe people are going to be consuming 38,000 ice cream cones, seems a lot, its coefficient of intercept seems really high. Price, so every unit of price goes up and it's going to drive away -44 units of ice cream consumption. As temperature goes up, I increase my consumption by 2.28 units. The number of conventions. Okay, so a few more conventions in town, my ice cream consumption goes up, it's pretty minor. The year you say, so as the years are getting larger my consumption is going down. You look at these t-statistics and you say yeah, this intercept is insignificant, my t-stat on this price is negative 1.95, it's a marginally significant. My p-value here is 0.05, and so it's not too far away from 0.05, we'll say it's marginally significant. This temperature, temperature is meaningful, number conventions, it's marginally significant at about 10 percent. The p-value here is 0.09, and so it's like I believe this thing at about 10 percent. The number of conventions, this is significant. So everything is significant, this is a great model. So this is a trap that you might fall into is that you start looking at the output and you start paying attention to t-values and p-values, but then you are driven by statistical significance, but you aren't thinking necessarily about what the model means. I have a variable of year because I took these data over three different years. When we put years in as years, as a continuous variable, then what it says is, so 2001 is different than 2002, the number 2002 and the number 2003, rather than the year 2001 and 2003. So the interpretation becomes a little bit challenging because we don't know exactly what it means for us to increase one more year. So one more year from 2001-2002 or one more year from 2002-2003. What does that mean? Like as a year passes we just consume less ice cream because why? We don't like it or something's happening in the economy? That interpretation, like forcing the variable to take on a continuous nature, gives us a very interesting and strange interpretation. So one of the things that you have to be aware of when you're running regressions is that it's very easy to just click on the tab, let's run a regression, highlight your dependent variable, highlight a whole bunch of independent variables, click "go", look for some significance and then just say, "terrific, I've got a great model, my professors are going to be super happy here." You have to stop and think about the tool, it's just a tool, but the underlying motivation for the tool is to understand relationships. How does the movement of one variable relate to perhaps a different variable? Even though everything looks terrific here, like everything is significant or marginally significant, the interpretation is quite challenging because we're treating year as if it's a continuous variable rather than a state, 2001 was one thing, 2002 is something else, 2003 was something else entirely, such. So I'm going to show you how to establish a dummy variable states and how to run your regression using dummy variables. So the takeaway here is, progressions can be easy once you start getting the flow of this thing. Everything could look really terrific when you're running regressions. You're being motivated by statistical significance, but you use that motivation rather than a good interpretation and understanding of what the regression is trying to tell you, some relationship between variables. So the warning is, don't let the statistical significance overwhelm you so much that you fail to identify what is the model that we're trying to understand, what's the relationship in the industry that we're really trying to get a grip on. Even though this model looks good, this model looks good, it is a little bit problematic. So let's do some dummy variables and show you how to interpret those.