So, I've been using the word confounder a lot and confounding. So, let me define it. so, variables that are correlated both with the explanatory and response variables are confounders and they can distort an, an effect, an estimated effect. so in this case victim's race was correlated with both the defendants race and whether or not the death penalty was executed. and this is a, this is I think, kind of an old school definition of confounding. There's, there's a modern definition given by causal inference that causal inference classes that, that I think you, you, you might be if you are interested in the field of statistics will be worth learning. I think ultimately with a, with a confounder here, we, we haven't really distinguished between something that's causally related with the race in the death penalty versus something that has a statistical association with race and death penalty. In this class, we're mostly going to be talking about, things that have a statistical association with the explanatory and response variables. Where there is kind of a plausible causal connection between them. Okay. so, you know, again, putting aside the rather difficult and lengthy discussion of what is a confounder? how do we select our confounders, this, you know, how do we, you know that, that discussion, putting it to the side. Let's assume we have a single confounder, how do we adjust for it? Well, you know, there's several ways regression is probably the biggest and most common way to, to, to adjust for confounders. But, the kind of an old school way in categorical data analysis. Is to stratify with a confounder and then co, combine the straightest specific estimates. And so requires, this requires appropriate wei, weighting of the straightest specific estimates, and we'll talk in a minute about how do you do the appropriate weighting? And unnecessary stratification has its own set of problems, so you know again, you know just bringing back this discussion. The, the solution to the confounding problem is not just to stratify or adjust for everything in sight. Right. That's not the solution, because that has its own host of consequences. you know, for example in, in any of these Simpson's paradox examples imagine a giant database. And you're interested in, say, the death penalty. and you had a giant database with lots of other other variables. For sure you could find one variable that reverses the association, but has no bearing on whether or not a person received the death penalty, right? So, so, adjusting for that confounder will reverse the association. But has no real business for being adjusted for. And so, it's a hard topic you know admittedly it's a hard topic of selecting confounders and achi, achieving balancing. Between the right amount of confounder adjustment and the with over adjustment, balancing between that and over adjustment. so let's stipulate for the time being, we have a confounder and we want to adjust for it. And what I really want to talk about is the method for stratifying and then combining stratus with specific estimates. And then because then we will be able to teach you some nice methodology. And then, as we take more statistics, courses you'll learn more about the delicate surgery of, of dealing with statistical confounding. Okay, so I, here I have aside, but it's an important aside, suppose you have two scales and what I mean by scales, I mean, things for weighing objects. And, let's assume both scales are so called, unbiased, they both have some variance associated with them, who weigh the same thing over and over again, you get different answers. But one has a variance of one pound and the other has a variance of nine pound, they're both unbiased. so, confronted with weights from both scales, would you give both measurements equal credence, so, let's supposed we weigh an object. And that our first weight was this variable X1. And we're going to assume that it's normal. Mu in the variance of the first scale, sigma 1 squared. And then X2, because both scales are unbiased, we're going to assume that it's normal and it has the same population mean, mu, and the same. different variant sigma 2 squared and let's assume both sigma 1 and sigma 2 are known, we want to estimate mu this unknown weight of the objects. Okay, so, we measured it with one scale with one precision another scale with another precision, we're assuming both scales are unbiased and that if we measured the same object over and over and over again. Again, the average would be about right. so, If, if we characterize this in this way. I'm hoping what everyone can do in the class is set up the likelihood. Multiply the two. add the 2 log likelihoods, or multiply the 2 log likelihoods, and take the log. and then, come up with the fact that the log likelihood from mu, disregard any terms that, that don't involve mu. And I'm hoping everyone could come with the fact that the likeli, the log likelihood for mu looks like this, bottom equation right here. Okay. And you know, you can, let's solve for a maximum likelihood estimate so, the easiest way to do that right now would be to take the derivatives, set the derivatives equal to zero and you get this answer. X1 times r1 plus x2 times r2 divided by r1 plus r2, or in other words, x. Times p plus x2 times 1 minus p, where p is r1 over r2 plus r2 and 1 minus p is r2 over r1 plus r2. And in this case, ri is 1 over sigma squared sub i and then p is, of course, 1 over r1 divided by r1 plus r2. So, why does this makes sense? This makes a lot of sense to me now. but the first time you see it, you might say this makes no sense but let me describe why this makes a ton of sense? Okay, so notice what each ri is. It's 1 over the variance. So, if let's say sigma one is huge. In other words that scale stinks, it has this huge variants, then the weight r1 times x1. The weight given to the measurement from that scale is very low. And then conversely you know, if, if sigma 2 is very small. Then, I get r1 which is 1 over sigma squared will be a huge number and then, x2 is given a gigantic weight and then we divide by r1 plus r2. So that, so that when we weight these two things. X1 and x2. We, we get a convex combination, p times x1 plus 1 minus p times x2. So, that it's an average. It's just a weighted average. Okay? and by the way you can do this always if you want to take a generalized form of average right? You, you know, then you want r1 is the weight for x1, and r2 is the weight for x2, they have to be positive, of course. and you want to turn it into an average, then divide the whole by r1 plus r2 and then you'll turn it into p times 1 and 1 minus p times the other. If r1 equals r2, they'll be the strict arithmetic average of the two numbers, if r1 is different from r2, it will weigh one of the observations more than the other. Well, the, the answer in this case is that we want to weight by the inverse of the variance giving high variance measurements, low weight and low weight and low variance measurements high weight, which to me then. And makes a ton of sense. Okay. So, any way, the general principle. Instead of averaging over several unbiased estimates, take an average weighted according to the inverse of the variances. And this is so ingrained in statistical practice now that people do it without thinking. They don't go through this exercise of deriving maximum likelihood equations. so, in our case, sigma 1 squared was 9, was 1. Sigma 2 squared was 9. So, in this case, you can work it out. P works out to be 0.9. So, it works out to be 0.9 times the first measurement plus 0.1 times the second measurement. The first measurement getting a lot more weight. Because the scale you know? Has is, you know? Has 1 9th the variance of the other one.