In this screencast and the following two I'm going to introduce you to mathematical models, particularly making mathematical models from experimental data. This is also known as a regression equation or a regression model. Oftentimes we have experimental data and we're trying to find some sort of interesting correlation between the two. And a lot of times this is maybe for predictive purposes, so we want to be able to correlate our inputs in our process to the output. And in that case we can use the output to maybe predict something in the future. All models are going to have one or more input variables. We then have some sort of mathematical equation going on here. Y, I'm going to refer to as our output, that's also known as a dependent variable, and input variables are typically x. If we had one input variable, that would be x, it's an independent variable, but we can have multiple independent variables. For example, we could have x1, x2, x3, and so on so. So I'm going to start simple, I'm just going to assume we have one input variable. And maybe we have a coffee shop and the output is maybe the daily sales, and then the input that we're interested in is maybe the outside temperature. Obtaining data, experimental data, maybe you collect data for a couple of weeks. You measure maybe the average daily temperature and that'll somehow correlate to the predicted sales. So obviously, if it's cold outside, you might get more coffee sales. If it's really warm outside during the summer, probably won't have as much hot coffee sales. You might have more cold coffee sales or something else. So you go out and you collect experimental data, average daily temperature is the parameter that varies, and then the dependent variable is going to be the coffee sales. Coffee sales depend upon the average daily temperature. So we go out and collect maybe 20 or 30 data points, and this is what we get. Now if you own the coffee shop, you might want to come up with some sort of model so that you can predict the coffee sales, that's going to be our output, as a function of the average daily temperature. So maybe during the summer you can expect to see lower sales, but during the winter you can expect to see more. Or the next day is predicted to be 50 degrees, so you can predict the number of coffees that you're going to sell the next day and then you can make preparations accordingly. So what we're trying to do is come up with this mathematical equation y is some function of x. So we're trying to come up with some mathematical model such that then we can put the daily temperature as the x variable into this model to predict an actual value for coffee sales. We're first going to look at simple linear regression, that is just an equation for a line. Some of you may have seen the format y = mx + b, and that's the exact same format. What I'm doing though is using a slightly different notation. So in this example, the y = mx + b, m would be equivalent to β1 and B, which is the intercept, would be this coefficient β0. So β0 and β1 and other betas are just coefficients, there constants. So I'm going to go through a different example I've got actual data for. This spreadsheet is found on the course website, you can go ahead and download it. What we've got is as a function of the Year from 2000 to 2009 the average yogurt consumption in the United States per person, pounds per person per year. If I just wanted to look at this in a plot, I could select both of those and I can go up here to the insert Tab and Charts and I can just make a quick scatterplot. Now, obviously that looks a lot like a line. I'm going to go ahead and format the axes. A lot of you probably know that you can go in here and you can right-click on a data point and you can go down here to Add Trendline. And over here on the right you can display the equation on the chart. You can also display something known as the correlation coefficient, which is r. But this is then the equation, the mathematical equation that relates y, which is our yogurt consumption, with x, which is are Year. So then if I wanted to easily predict in the future, perhaps, then you could just put that year into this equation and you could get an estimate for the yogurt consumption. I'm going to show you a different way to do this, to create this equation of the line and there's a couple reasons for that. The first one is using the Regression tool in Excel you can get a lot more information out of it than just what's shown here. In this example here, we know that the slope is approximately 0.6825, however, there's uncertainty with that. And when you take more advanced statistics courses you'll learn how to estimate the uncertainty. But I just wanted to show you how we can use the Regression tool to get us something known as the confidence intervals about the slope and the intercept, which is -1359. I'm going to go ahead and I'll leave this up for now, but I'm going to show you how to use the Regression tool. I'm going to go up here to the Data tab. Now if you don't have the data analysis thing over here, the Data Analysis tool, you can go into File > Options click on Add-ins and then at the bottom click Go. And then make sure that the Analysis Toolpack, it doesn't matter if you're using the VBA one or not, just make sure one of those is selected and go ahead and click OK. And now when you do this in your Data tab, you should have the Data Analysis tool over there. So to do regression we're going to click on Data Analysis. It's going to bring up this box. We're going to go down here to the bottom and click Regression, and then click OK. It brings up the Regression tool. Yogurt consumption depends upon year, so we're going to put for the input y, I'm going to highlight starting in the first row. You notice that first row contains labels, it's not an actual value. So because of that, I'm going to click Labels here. The Regression tool will know that the first row is our labels. The input x, I'm going to select A1 through A11, and I'm just going to go ahead and leave everything else, and I'm going to click OK. And what it does is it creates this new sheet, and if you click somewhere else and then go up here and drag-select all of those columns you and then you can click on one of the borders between the columns. Just double-click and it just auto formats and auto rescales everything. You'll learn a lot more about this in more advanced classes if you ever take more advanced statistics classes, but the most important thing for us is the intercept of -1358. You notice that that's exactly what we got back here in the plot. We have -1358 as the intercept and the slope of about 0.6825, so it tells you the same values. However, the important thing are these confidence intervals and P values, which you might learn about later. But this is telling us that we're 95% sure that the intercept lies between those two regions. Remember, we can never be 100% sure of the intercept, and we're 95% sure that the slope lies between these two values. So this is a good way to create models, regression models. Another thing it tells us is if we go back here to the plot, this is known as R squared, the correlation coefficient squared. But you'll see in more advanced statistics courses that really a better measure of a good model is not the R squared, but it's something known as the adjusted R squared, which you can only get from the Regression tool. So this screencast shows you how we can plot just a quick plot to get an equation. In the next screencast we're going to work with more advanced models, in particular polynomial models and general linear models. Thanks for watching.