Hi everyone and welcome to our lecture on modeling with exponential functions. In this video we are going to model the world population. I have data here from the US Census Bureau for the years 1950 to 2000. It gives the population and millions so the first number for 1950 is not 2555 it's 2,555,000. So you could add three zeros as each one of these numbers to get with the US Census Bureau estimates for the world population. Okay, so I have all this data. And you've heard me say before, and maybe it's a pretty popular idea that population grows exponentially. So if I get a bunch of data and I'm wondering if this exponential model is the right fit, the first thing you normally do with data is you visualize it in a scatterplot. And we're looking for a nice exponential curve. And here it is. So I have the data of the years plotted on my x axis and my y axis the population in millions. So if you look at the graph here we have a scatterplot. And it looks to move upward to the right as we would expect, but perhaps it's not obvious that we have an exponential model. Maybe linear is better, maybe some parabola or something else is even better. Who knows. So what's nice is we'd like a way to test when an exponential model is a good fit. Just like we have for linear fits how good is the model, we want some sort of measure of how good the fit is to an exponential model. So let's go ahead and talk about that. To measure how good a fit is with an exponential model, we do something called the Semi-Log Plot. So let's start off with an exponential model. We'll write down our model, which is a function as y = c, some coefficient constant in front, and we'll do e as our natural base to the k times x. c and k are some real numbers. Remember, if I plug in 0 then I get either a 0 which is 1, so C is my initial amount wherever I start at wherever my first value is when x is 0. And k is my rate of increase my exponential decay rate, my growth constant, my decay constant that is controlling. And e is the number 2.718 change the irrational number. So now here's what I want to do. I want to take luggable side. Now this is always a weird thing. Why would you introduce logs when there's no logs to begin with? Well watching find out. So we take logs of both sides perfectly legal move. We take log of C e to the k x. And now we're going to use properties of logarithms to write this product does a sum of logs I write ln of C + ln of e to the kx, so you got your multiplication, log turn into addition. Log n exponential they are universe will cancel. So I get the natural log of y, is the natural log is C+ k to the x. Still at this equation Let me know if it looks familiar. I said, do you what's the graph of this? Do you recognize it? Let me reorder some things a little bit. If I write this as the natural log of y is k x + ln of C. Does that help? Let's see. We'll probably say no right now. Remember, log of C is just some number. So this is a constant. And if you think of k as your slope, all of a sudden this becomes the formula mx + b. This is a line, but the variable now the output variable is not y, but it's log y. So that's why it's called Semi-Log Plot. So this is linear in natural log of y. That's amazing because now if we have a good exponential function, if I do this little trick of taking a log of both sides and re graphing my data, so I still have my x axis, but now instead of my Y axis, I have the natural log of y. If I have a good exponential function, when I do the Semi-Log Plot, I'll have a nice linear function. Maybe it's not a perfect line, but we can now apply our knowledge of linear models. To this semi log plot and measure and look at what is R2 was the linear correlation coefficient. What's the equation? We can play around with this and study this and this becomes something we're more familiar with. So let's do that with our population data. This is a pretty standard way to test if something is good for exponential modeling. So if you ever going to use exponential model, it's probably best that you test this and look at this first. So here's what I did, I took the population of the data that I had, and I took the log, the natural log ln of all the numbers. And you can check before I had 2555. That's a log of that as 7.85. So I have my numbers, 7.85 all the way up to 7.1. So that will change my y value. So let's label the y value here in my scatterplot. This is the natural log of y and my x value I do something else that's pretty common. I don't want to have the years 1915, 1960 1970 they're they're kind of large numbers and they're not fun to play around with. So normally it's pretty common. You don't have to do it. This is just a preference when you start measuring your data, you start at year 0. It just makes things a little nicer to work with intensity give you prettier pictures, it is just a preference of the math will all still work out. So now what I'm going to do is, these are years from 1950. So the first value is at 0 and then now 1960 becomes 10 and so on and so on. And if you look at the graph, I see that these points line up not perfectly, but man, they're pretty darn close in a nice line. So this linear Semi-log Plot is suggesting that the data is in fact exponential to begin with. Let's do some linear regression against this data and model just how good it is. I leave this to you to check If you were to look at the R square value, you get 0.9976. That is incredibly close to 1.9976. If I want the linear correlation coefficient if I wanted r take the square root of that value, and I get 0.998. Let's say 8 after rounding. So close to 1, this screams that this is a Semi-log Plot is a great has strong linear correlation, which then in turn says our data has strong exponential correlation this model with an exponential function. So we should feel pretty good about modeling population data as we said before, but now we actually have a way to test it. And this is a way sometimes you get a bad linear correlation here and then you wouldn't use an exponential function to begin with. So we'd like this little test and we log plots. They're pretty nice and pretty similar way to test exponential model. Okay, let's go back to the data and now we feel good about our model. So we will fit the population in millions were populations and we go through the same process. The scatterplot here is created by Microsoft Excel. I have the R squared value from before 0.9976. We use Semi-log Plot to get that and I graph my nice exponential curve sloping upward as we go. Nice exponential growth for my values here. I wanted to show you I put it back the years on the x axis. So these are my years. I put it back in terms of the standard here is 1950, 1960 all the way up to 2000. And the population here is back in terms of its original value, the y value because we're no longer testing if it's exponential sample the population bank. And if you notice when I run the equation for this exponential function, I get kind of a funny number as my coefficient in front. This number 3E-12. If you look into that, that capital E is not our natural base, right? We have capital E, lowercase e, and this expression here, this function is y = 3E -12e to the 0.0176x. What is this number? Well, you may recognize this there's a button on your calculator probably too. It's a single e or a double e. I've seen it Capitalized and lowercase, this is scientific notation. So 3E -12 is the same as three times 10 to the negative 12. It is a point it is like 12 0s, lots of 0s. It's pretty darn small and then there's a 3 somewhere at the end of it. So you have lots of 0s. Note the most prettiest number to work with 12 decimals can get pretty bad, pretty hairy to work with. We don't particularly like it and remember this coefficient in front it's supposed to be the value when I plug in 0. It's supposed to be the value when I plug in 0. y it's a problem is because my x axis years doesn't start at the origin. This isn't recommended. It's not a great way to do it. Yes, it'll all still work out, but we just don't like it. So what I'm going to do now is I'm going to redo this graph. I'm going to keep my y axis as the population in millions. And I'm just going to adjust the x axis so that I have the years from 1950, that'll put our value at 0. Here it is, again, our nice new scatterplot with a new exponential function, this exponential functions a little nicer. My scientific notation is going away. y = 2576.1e to the 0.0176x a little nicer. R squared has not changed. Let's talk about that for a second. There is no change. Why is that? Should it have changed? I changed the x values. Notice my x axis here is from 0 to 50. So these are the years 1950 to 2000. So this is the year since 1950. If you remember from the last video, the R squared value does not depend on the units. And It's independent of units. So whether I do years to 1950 or a year to 1960 or who cares? As long as I plot these points, it will not change 0.9976. What's interesting about this model and this is again produced by Excel when you go through the process and just select exponential models that are linear. Is that the lead to the coefficient here is 2576.1. Remember this value up front is supposed to be y(0). You can do one or two things here 2576. It's close to our value our data point. Remember our initial data point with value 0 was 2555. Okay, so obviously this value is rounded and the population is just a guesstimate anyway. So 2555, 2576 maybe you want to override it and just so it matches with your data, maybe not, but I just want to call out that they are close ish but slightly different than if you want to this is your model. You can do whatever you want. You can use 2555 is the coefficient or use the formula from the best fit. So now we have our model. Brilliant we have our equation we'll use the one from Excel here 2576.1e to the 0.01766. What do you do with this model? Well, we'd like to make predictions. So one of the questions that we can ask is, we can start to see like, where's the world population going? So what will the population be in let's pick a year in the future? How about the near future or the current that was 2020. Let's test the model, this is pretty common thing to do against some known values. Let's see how it goes. I'm sure there's it's out there we can we can ask the internet, what is the population 2020? And I'm sure someone's keeping track and we'll find out, but let's just ask this model, population 2020. A couple things about it, when we plug into our formula, we're not going to plug in 2020 remember, I've adjusted the years, this is years. From 1950 in particular, this is x = 70. And so let's plot in let's see what we got some we got y = 2576.1e to the 0.0176 and then we times it all by 70. I can not do that in my head. So let us grab the handy dandy calculator 2576.1e to the 0.0176 times 70. All right, let us see what says I get. Again, this is going to remember your value here. I am going to round this thing 8831. Now remember, this is units. This is millions. So if you want the population, it's talking about 8.8 million. So this is like 8.8 million people in 2020. This is what the model from Excel is telling us. Okay, so that's our prediction. Now, well, let's just talk about this value for a second. Does this guarantee that there's going to be a point a million people In the year 2020? No. This is just what the model was saying. There's error in here and who knows if this direct curve. If you notice also in my data I'm not using data from 2010. So I'm like missing a datapoint. If I really wanted it to be more accurate, I will get more data. This is also just the data from the US Census Bureau. I have a feeling if I started looking around or other places that trying to capture the world population, I perhaps would have had different data points. So we would look at different models. The thing that I feel good about though is that this is a good exponential model. And I like that I'm using an exponential model and not a linear model or something else to model the world population. I'm going to show you also we also use Desmos In this class. If you plug in Desmos, I copy and pasted the spreadsheet right into Desmos. It automatically means the variables x1, x sub 1 and y sub 1. So we can model this data. And the way to do in desmos is you type y sub 1, and use little tilda. That's the squiggle symbol. And I introduce Ce to the kx 1, C and k are just arbitrary values. In Desmos, you can pick A or B or wherever you want, it doesn't matter. So they're just whatever variables you want. As soon as you type it in immediately it will give you the R squared valued. And then it will estimate for you the coefficient C, and then the growth rate K. What's interesting here is even with the same data set Desmos and excel are doing things slightly different means probably some rounding. But I get a different value C. I get my equation becomes y equals 2609.11, I gotta remember, it's the one we have from excels 2576. So I mean, they're pretty darn close, but there is interesting that they're not exact. And so again, different now, software's will give you different values, but hopefully when we look at things in terms of the millions, it'll be close enough. So now we do e to the k. One of the thing that you can I guess you can play with the settings but Excel rounds things, decimals is trying to give you as many decimals as you want. So 0.0172113. And let's let's ask Desmos the question again. Hey Desmos, what will the population be in 2020? And so remember, this is what x = 70 because I'm still using as my x coordinates, the years from 1950. So we'll take out our calculator and once again we will evaluate this number. So let's use our different number here 2609.11 second e, I copied the decimal carefully here point 0.0172113. When I say that the long way that sounds no fun time 70 and you get let's see here I'm going to round again to get 8704. So this model is kicking back about 8.7 million. All right, so close enough to four models. That would be your prediction. So we have fancy software like Excel, we have decimals coming in and giving some close values and here we go. So, let's ask the knower of all things again I have no knowledge of this is true or not, but I threw the question into Google. I said, Hey, what's the world population in 2020. And the first number that came back was 7.8 million. I just realize I'm sort of lying a little bit when I say 7.8. I surely say 7800 million. It's a lot to say, but we're dealing large numbers here. But the point is 7 trillion rarely 7.8 trillion people. That's a lot of people. However, it's not as many as what the models predicted, the models predicted 8.800 or 8.7. So what happened? Is it bad have the models bad? No, they're often with any model. It's a little incorrect. We will try to go back and do a better job predicting our data. Maybe we would not include past years, maybe something weird and anomaly happened. Or we should go back and add the 2010 number and see what happens. There's lots of things you can do when you're not happy with the prediction or maybe you say, close enough. So completely up to you. The point of all this is that we're trying to predict the future. Remember that has a name, extrapolation, and whenever you're extrapolating, just be very, very careful. Some people will take these results as like this is what it is. This is the answer. No you're doing your best educated guess based on the model. Model might need some tweaking. If we were doing interpolation where we were picking a value between 1950 and 2000. I would feel more confident and maybe we can go do that and check with the year the population was in like 1975 or 1985. But we're extrapolating. So perhaps it's not surprising that we're off by a good, good amount. So great job with this video. We'll do some more modeling in the next one. See you next time.