Let's look at another transformation, a logarithmic transformation. In these data, I have downloaded information from Box Office Mojo, about the movie, Skyfall, the James Bond movie, it's one of my favorite movies. What we have here is, we have the date that the movie was released, which weekend it was, the rank the movie was in Box Office, its weekend gross, so how much it earned that weekend, the change in the weekend gross, the number of theaters, the change in number of theaters, its average per theater and then its total gross and then the week that it ran, so week one, week two, week three, all the way through week 18. What if I wanted to regress the weekend gross as a function of the week? So if we know how movies are doing, can I predict how it will do in week 13 or week 16 or week 4 or something like this? I might be doing this because as an owner of AMC or something of this nature, I want to determine, should I stop leasing the Skyfall movie and bring in one of the newest releases? Should I run it on three screens versus two screens? There's just plenty of decisions that I can make by understanding the flow of revenue from, let's say, theatrical release. Let's look at the scatter plot for a weekend gross as it relates to weeks. So I'm going to highlight weekend gross here. I'm going to insert a scatter plot right here. I'm also going to then select my data so that it's running over weeks here. My X variable is weeks. Now, let me make this chart a little bit bigger. So here is the data. Now, it kind of looks like a half of a smile, if you will. If I were to click on this, I could get a linear trend line. Let's just add a trend line here. Let's add a linear trend line. There's my linear trend line. If my linear trend line is trying to estimate whatever my Y variable is, as a function of my X variable, this trend line does not really capture very well what's actually happening in my data. You'll notice that there's just very few points that line up. But instead of capturing the trend, this downward trend, but it's not capturing the flow of this thing. Well, unless of course you're week 13 and it captures it perfectly. But generally, this is a terrible trend line for these data. What are we to do? Well, it just turns out that this trend occurs a lot in industry, you'll see it in the world quite often. This curve or linear relationship is what we call logarithmic relationship. This relationship here, is the same relationship you'll see if you were to, let's say, map out the population of the 100 largest cities in the United States, you'd actually see a trend like this. If you look at the total wealth held by the 10 or the 100 richest people in the world, we'd see a trend that looked like this. The universe picks up on these things and we use this. So this is the same kind of trend that we use when paying out golf passes or for tennis tournaments, things of this nature. We pay on these logarithmic trend lines. So this, if you find yourself with a logarithmic trend data that are curvy in nature, trying to run a linear regression on these data, is incredibly impossible actually, because you get really, really weird predictions, really weird outcomes. So what we should do is, we should engage in what's called a logarithmic transformation. Now, there's two logarithmic transformations that pop up in business school time and time again. The first one is much more prevalent, what we call natural logs. A natural log is a relationship between this thing called Euler's number. So Euler's number is 2.71. I'm going to tell you what it is exactly, 2.718282, that's Euler's number. So the natural log asks this question, if I were to take 2.7182 and raise it to a power, what power would I have to raise it to, to get this number that I'm looking at? So let's suppose that I've got this weekend gross here for week one of Skyfall, 88,364,714. If I were to take the natural log of that, and say equals LN of that guy right there, I get a number, 18.29. What is that? What does that even mean, 18.29? Well, that is, if I were to take Euler's number, 2.7182 and raise it to the power of 18.296, I would get, there we go, the $88 million. So the natural log is the exponent to which you have to raise Euler's number to get the number you're looking at. Well, what's the use of natural logs? Let me show you, I'm going to copy this down. We're going to do a scatter plot of the natural logs in relationship to the number of weeks. So let's just highlight this and insert a scatter plot here. Looking at this and you say, well that looks very straight, that looks quite nice. Yeah. I'm going to do this over weeks as well, right here. So now, you'll notice something about this. Let me add my trend line to this. Whoops. Look, this linear trend line in the second chart here, is a much better representation of the data than a linear chart in the first set of data here. So this is, I'm using levels, right here. Here, I'm using logs. Down here is, I'm saying what's the LN of my weekend Box Office. The top one is just the level, what's the weekend Box Office against the weeks and here's your relationship between the log of the weekend gross Box Office and the weeks. This is a much nicer set of data with respect to this linear trend line. I've transformed the data and by transforming the data, this gives me a much better fit, line of best fit. So I'm going to have much less error in this second trend line than I am on this first trend line. If I were to estimate this using a regression, which is what we're talking about here. Let me show you how I do this. Let's do my data analysis, I'm going to use regression and I'm going to regress the log of my week against the week itself. I'm going to add a data label up here, we'll call it LN week box, okay. So now, let's choose my Data Analysis again. I'm going to take the log of my weekend box here as a function of my week here, and I get this output right here we'll call this Skyfall output. Okay. You see so that's interesting, okay. How do I interpret this? Like this doesn't look like my regression output. It does look like my regression output, but the coefficients don't exactly make sense. Like if I would interpret it, prior would be like my movie is going to make $40 million and I'm going to lose some money every week. This thing says 18, it doesn't sound right, because didn't my movie start at $88 million, and then go down from there? It did. The movie did start at $8 million. We've taken a natural logs, so what's happening is that this model is predicting the natural log of some box office amount, okay. So if I would construct a little table like I did for Karl Malone and the principle game, I would say, okay, what about weekend box? I'm going to try to predict that, okay. But I'm going to try to predict it using our week, okay. So, here's my input over here. I'm going input some piece of information right here, an input here and over here, I'm going to get some output, okay? So I construct my formula, and say, this is equal to my intercept, plus the coefficient, times the number of weeks right here, all right. So if I put it in a number of weeks, let's let's say like week 10, it says, so week 10, my movie should make $14 million, not exactly. Remember this is the natural log of the weekend box. This is all transformed and then I'm going to untransform it, all right. So this is more appropriate, this is the LN of the weekend box here. So if I want to untransform it, which is the actual week and box, I have to say, okay, I'm going to take the exponent of this thing, which will tell me the untransformed version of this. So when I take the LN out of something, it tells me, let's take it into the exponent, and they'd say, let's take the exponent then what I'm doing is I'm saying, let's take eulogies number raised to the power of whatever this exponent is, to tell me what the actual number is, and viola, I get this right here. So, my prediction is okay. If my model was any good, in week 10, my movie should be making something like $1.6 million. I don't know how good did we do, what happens in week 10 here, okay? Week 10, my movie made like $1.5 million. It's not a bad model, all right? It's certainly going to be better than the prediction that's going to be given me if I do just a regular linear regression on these data. So what's the takeaway here? Takeaway is that, sometimes we need to transform the data. Sometimes you need to take the data and say, it doesn't look linear. So using linear regression isn't really appropriate, what should we do? We could take a squared term, like Karl Malone's output relationship to his age. We could take a natural log. So the relationship between a movie's box-office gross, and how many weeks the movie's been out there. The truth is that, sometimes we'll be analyzing data and it will be appropriate to take the log of both the dependent and the independent variable, or take the natural log of just the dependent variable, or take the natural log of just the independent variable, or we have a level of the dependent variable, we have some level of an independent variable and a squared term, or a square root term, or a cubed term, or a cube root term. Truth is that we don't always know the relationship. The tool that we have us called a linear regression, which means we're trying to fit a straight line. What we have to do, sometimes we have to transform the data inside our model and then take a linear regression of the transformed data. You will notice here from these plots, is that, a linear line, a line that's put over transformed data, gives you a much nicer flow than this line put over the data, because the data are in fact curvy in nature. So you're likely to see these things in your MBA studies, you're likely to see them in your marketing class, you're going to see them in your finance class, you're going to see them in strategy and apps, okay? So the earnings is revenues, okay? Revenue cycles, okay. These things happen in logarithmic ways, and so if you're going interpret or try to forecast output, then it would serve you well to practice these things, see if you can transform something from a level into a natural log, see if you can run a regression of that natural log and see if you can transform it back into a level so that you can make a prediction from a logged regression into a actual output. Let's try some regressions when it comes to some things that you might actually have to do in your MBA, like such as calculating Beta.