In the last lesson, we talked about a binary instrument. And then the treatment itself, which was the instrument causing the treatment, the treatment itself was binary. And that led us to talk about the causal average complier effect, etc. Now principal stratification can be viewed as a generalization, okay? You recall with the causal average complier effect, we were looking at estimation within the latent sub-population of compliers. So we can generalize this idea of estimation within latent sub-populations, now called principal strata. And these principal strata are defined by post treatment potential outcomes, just as in the case of the case. And they lead to the estimation of so-called principal stratum effects. So you can see that what we're doing is we're estimating the response Y under difference between treatment and control. But in these latent sub-populations, and they're latent, of course, because we're going to see an M0, or we're going to see an M1, but we're certainly not going to see them both. And so we could do this like above, or we could do this conditional and covariates, X, as in the second equation. So as again, I want to mention that these effects do not require us to define potential outcomes Y z, m which in some context is an advantage. And in another context is just different. Going back to the previous lesson, again, we looked at the special case in which the intermediate outcome M was indexed by whether or not a subject took up the assigned treatment or not. And then we used an exclusion restriction and the monotonicity condition to make up our case. And we saw that the ITT, which I'm writing on the left, we saw how that reduced to the product of the CACE with the complier probability, okay? Now, more generally, you may want to relax these assumptions and allow M to have even more categories or to be continuous. Maybe also allow for the possibility that treatment assignment depends on covariates. You may not have unconfoundedness overall, but you may have unconfoundedness given covariates. Or you may just be interested in treatment effects that vary with covariates. Let me give you an example about principal stratification. Consider the following question. Does coronary bypass surgery improve the subjects quality of life? Okay, well that seems to be a causal question. So we want to compare a subject's quality of life when he undergoes surgery with the quality of life when he or she foregoes surgery. You see in this is an immediate problem. You can only ascertain the quality of life for subjects who survive. So, let Z be 1 if the subject receives surgery, 0 otherwise, and let Mi be 1 if i survive and 0 otherwise. So one solution is to simply define the quality of life as 0 for subjects who do not survive, this seems kind of arbitrary. So another solution is to compare the survivors who received surgery with the survivors who did not receive surgery. But recall that if assignment to Z is randomized, this would give you the descriptive comparison. Because I can take out the Z but I can't take out the M. And this is very similar to where we started the whole set of lessons on mediation. Anyways, unless we're willing as in the case of mediation to assume survival status M is independent in the potential outcomes Y, z, m or something weaker but still not very reasonable. We can't really make a causal comparison. However, if we confine our attention to those who survive under either condition, which seems to be the reasonable comparison in any case here. We obtain the estimand, the difference between treatment Y and control Y, given the group that would survive under surgery, men also without surgery. This is the so0called survivor causal average effect. And for other examples where this is of interest and for some other estimands that fit into the principal stratification framework. There's a nice, very accessible paper by Page, Feller, Grindall, Miratrix and Somers. As before, identifying effects within principal strata will require assumptions. Okay, so, the monotonicity assumption is often made and in many contexts, it's reasonable. So, for example, if we're studying test outcomes, you might be willing to assume that a student studies more when encouraged to study than when not. Okay, this is an example where the mediator is continuous. Okay, an example that we used before. So, more generally, in randomized studies where treatment is designed to operate through a mediating variable, one might be willing to make some sort of monotonicity assumption. Similarly, in a study of fitness, a subject assigned to an exercise program or a diet may be more likely to exercise more or diet more than in the case where they had not been assigned. In some instances, it may also be reasonable to assume something like there is no effect on the outcome in strata where the intermediate outcome is unaffected by treatment. Now, in general, more assumptions are going to be required to identify causal effects within principal strata when M takes on more than two values. So as an example, if M were to take 3 ordered values, 0,1 and 2, we have 3 principal strata with M(1 ) less than M(0). So we might go ahead and assume that's not possible. So that means that the probabilities in those principal strata is 0. Then if we assume monotonicity, we're going to end up with 5 of the 9 principal stratum probabilities being identified. But without at least one additional restriction, the remaining probabilities are not. In other words, monotonicity alone isn't going to get us identification of all the principal stratum probabilities for the case where M takes 3 ordered values, okay. Now, furthermore, even if a restriction were imposed to get us identification on the principle strata probabilities, without further restrictions on the values of the principle strata effects, these would not be identified. So, for example, suppose we now knew the principle strata probabilities. Then by the monotonicity condition, if you see that M1 is 0, that tells you that M0 must have been 0. So, the principal stratum effect in this stratum is identified. And similarly, if you see that M0 is 2, that means that M1 is 2, so that's the probability that M0 is 2, so the principle stratum effect is identified in this stratum. However, you can show that the remaining principle stratum effects are not identified without imposing further restrictions. Now if M takes on more than 3 values, identification will require even stronger assumptions. So as before for continuous outcomes, one might use some kind of parametric mixture model with covariates. So for some examples of how one might go about this, there is a paper by Jin and Rubin in 2008 in a journal of The American Statistical Association. And also there's a paper by Joffe, Small and Hsu in 2007 in Statistical Science. Both of these are pretty nice papers. All right, I'm going to switch topics now, I'm going to take up regression discontinuity designs which are related to instrumental variables, in case it turns out. So, the regression discontinuity design arises when treatment depends on thresholding a continuous score. So, that the reader may see the similarities and differences between this design and its analysis and the previous material on mediation. Instrumental variables and complaints, I will denote the score Zi, treatment Mi and the outcome Yi.