In this lecture, we're going to cover some basic distributional results. And before I cover the instance where we're talking about a response and a predictor. Let me just talk about some results where X is normal with a vector mean mu and a variance sigma. So X, I would assume is m by 1, now we all now that if we took sigma to the minus one-half, x-mu that that's normal 0 i. So if I were to take x-mu transpose sigma inverse, x-mu. Well that's nearly the inter product of that vector. So it's the sum of a bunch of squared iid standard normals, which means it has to be chi squared n. But we can actually make a stronger statement about quadratic forms. Let A be an n by n, not necessarily full rank symmetric matrix. So where the rank of A is equal to p, which is not necessarily equal to n. Now consider the quadratic form x-mu transpose A x-mu. Well that will in general be chi squared where p is again the rank of A, if and only if A sigma is item potent. So let's prove this result and it's surprisingly easy to prove. Well we are going to prove one direction at least. So the fact that A is idempotent means that A sigma times A sigma is equal to A sigma. But I'm going to rewrite this in a way that's going to be useful for me by getting rid of that sigma. Remember, we're assuming that x is the normal vector, not a singular normal vector, so that sigma is invertible. So that means that I can write that A sigma A is equal to A. And then let me write the Cholesky decomposition of A as VD squared V. And I'm just going to write as VDV transposed, where D is a diagonal matrix of eigenvalues. And V is n by p, so D is p by p and V transposes p by n. Because it's the eigenvalue decomposition V transpose V is equal to I, p by p identity matrix. So take this statement, and I'd like to write it out using the Cholesky decomposition. So I get VDV transpose sigma VDV transpose is equal to VDV transpose, okay? But now imagine if I were to then, and premultiply this by V transpose and post multiply it by V. I would have to do that same operation on this side. And I would get then that DV transpose sigma VD is equal to D. Now I know I regret this decision, I should have had it been D squared. But now I want to divide both of these by D to the minus one-half, and I get Do the one-half. Which in this case, the square root is easy to think about, because it's just the square root of the elements down the diagonal of D. So I get DV transposed sigma VD to the one-half is equal to I. Okay, so our idempotence and our eigenvalue decomposition imply this relationship. Okay, now let's prove our result, at least one direction of our result. So consider the, Vector DV transpose times x-mu. Well, that's for sure going to be a normal distribution because whenever we have a normal vector. And we just do linear operations to it, it continues to be normal, and its mean is going to be 0. I think that's pretty clearly easy to see, and its variance is just going to be DV transpose sigma VD, I'm sorry, I should have had that D to one-half, okay? So the variance of this vector, is exactly the variance that we showed above is equal to I, a p by p I matrix. So this quantity right here is exactly normal 0, I. Well, but now if I were to take x-mu transpose or this quantity right here times itself transpose. So x-mu times VD to the one-half D to the one-half, the transpose x-mu, That's equal to x-mu transpose A x-mu. So what does this imply? This implies that this quadratic form is the sum of a bunch of squared iid normals of p squared iid normals, and is there for chi squared p. Let's go through an example, where it's not a sigma matrix down the diagonal. So I'm going to reuse some notation here, so I'm drawing this horizontal line to represent the fact that I am going to now sort of start over on notation for this lecture. And I'm going to that y, my y outcome is normally distributed with mean x beta in variant sigma squared I. Now consider the residuals, the sum of the squared residuals. E transpose e divided by sigma squared. Okay, that's equal to, I hope this is old hat for everyone now. That's equal to y transpose I minus the x hat matrix times y over sigma squared. But I could just as easily write this as y minus x beta, Transpose times I minus to hat matrix times y minus x beta. And the reason I can do that is because the x times this I minus the hat matrix is 0. So adding it in doesn't add anything and then over sigma squared. Well what is this? Well this is a normal vector minus its mean right here, in an quadratic form with the matrix in the middle. So according to my result, this will be chi squared if and only if they take to I minus H of x matrix over sigma squared, and I multiply it. So that's my A matrix from my notation before. And I multiply it times the variance matrix from before which was I labeled sigma. So that's sigma squared I, okay? Sigma squared I, so that is equal to I minus H of x over sigma squared times sigma squared I. That's equal to I minus H of x. We've seen on many occasion that that's idempotence. And let's go through an argument about the rank. So the rank of this matrix, the rank of I minus H of x. So the rank of a symmetric idempotence matrix is the trace, okay? So this the rank equals the trace. So that's the trace of I, the trace of H of x, which is x, x transpose x inverse x transpose. Okay, so the trace of this I. Remember, this is an n by n matrix. So that's n, the trace of I is n. And then in the trace of this, I can do trace, because trace AB is trace BA. I can do trace of x transpose x inverse x transpose x. So this is equal to n minus the trace of a p by p now, identity matrix which is equal to n minus p. Now that's the rank of I minus H of x. So the rank of I minus H of x over sigma squared is the same thing because I just multiplied it times of scalar. And so what we get according to this result is that our residuals, e transpose e divided by sigma squared is exactly chi squared n minus p. And so another way to write this out is n minus p times S squared or various estimates divided by sigma squared is chi squared n minus p. And notice has a special case of this, we get the instance of the ordinarily chi squared result for normal data with just a mean, that's the case where we just have an intercept in our linear regression model. And this just simply proves that that is chi squared, and we get n minus 1 degrees of freedom exactly like we show in an introductory statistics class, okay? This is a very handy result for proving very general chi squared results for general quadratic forms. And we'll find it very useful throughout the class.