In the previous module, we discussed principle stratification. Now we want to move on to regression discontinuity, which appears here in the discussion because it has some relationships to mediation and to instrumental variables, although they're not always obvious or at least framed in that way. Now regression discontinuity, these kinds of designs were pioneered by Thistlethwaite and Campbell, educational psychologists, nearly 60 years ago in an educational study to evaluate the effect of receiving an award on subsequent outcomes, such as receiving scholarships. The key thing is that the receipt of an award was based solely on a test score Z. Thistlethwaite and Campbell reasoned that students just above the cut-off and students just below can be compared. Thus the effect of an award could be estimated at least right around the cut-off value Z0. These kinds of assignment processes are not at all unusual. In medicine, where persons who have cholesterol levels above a certain number may be prescribed a pill. Similarly, a surgery may be performed only when the value of a critical indicator has been passed. In urban planning, only streets where one or more fatalities have occurred in the past year may see a change in the speed limit. Such designs are also popular in the social sciences and have been used and studied extensively especially by econometricians. Readers who want to read about the applications in economics may wish to look at the review paper by Lee and Lemieux. The Journal of Econometrics devoted a special issue to this topic in 2008 as did the Journal of Observational Studies in 2016. At first, it might seem odd that anything can be learned from a design where there is no overlap and score is between units receiving and not receiving treatments. In causal inference one, we focused on the unconfoundedness condition, noting that problems ensued when it was not possible to find observations in the treatment and control groups with similar values of the covariates or propensity score. Now, the question is what can we learn when the unconfoundedness condition holds but the overlap assumption may not? Clearly, you cannot, at least without making lots of possibly unjustified assumptions, learn about treatment effects far away from the cut-off. We'll focus on the neighborhood right around the cut-off. The following exposition parallels the treatment in a paper by Hahn, Todd and Van Der Klaauw in 2001 in Econometrica. It's a very short paper. It's just a little note, but it's just such a nice beautiful paper. It's really quite, quite nice and so clear so let me try to follow them. The score Zi, that's the score that has a cut-off Z naught, and so treatment M of Zi takes value 0 or 1. Is Yi of MiZi, which is MiZi0. It's 1 and that's it. The potential outcomes are Yi0 and Yi1. The thing we want to estimate is the expected value of the treatment effect at the cut-off point Z naught. Now there's going to be two types of regression discontinuity designs of interest. The first is the so-called sharp design. If you're above the cut-off or at the cut-off or above, you get treated. Otherwise, you don't get treated and that's it. Now in this design, clearly, the probability that you get treated obviously doesn't depend on the potential outcomes. It only depends upon the cut-off so that the probability that you're treated is 1 given Z. Treatment is clearly unconfounded, right? Because the treatment is independent of the potential outcomes given Z. Above the threshold, there are no control units to compare with treated units and below the cut-off, there no treated units to compare with the control. So you have unconfoundedness but you don't have overlap. Now in a fuzzy design, assignment is probabilistic. Basically, the idea is that you're much more likely to get treated if you're above the cut-off than if you're below the cut-off, but there may be units slightly below the cut-off that end up getting treated anyways and units slightly above the cut-off that don't get treated. It's a fuzzy design. It's common to make so-called regression discontinuity assumption. M minus is the limit of the probability of you being treated given Z and so you're coming at Z naught from below. This is a limit as you go towards Z naught from below. M plus is a limit as you go to Z naught from above, and so you don't expect them to be the same. M plus should be greater than M minus, and so that's the so-called regression discontinuity assumption that these guys aren't equal. Now using this regression discontinuity assumption, we're going to compare units just above and below the cut-off. Let's look at a Y when Z is just slightly above the cut-off and compare it to a Y when Z is just slightly below the cut-off. Now let's let Y1 minus Y0 be beta. Let's rewrite the expectation of Y when Z is just slightly above the cut-off. Now we can rewrite it using this beta i, where Z is just slightly above the cut-off and you get treated. Then that's times the probability that you get treated given that Z is above the cut-off plus the expectation of Y0 when Z is above the cut-off. That's how that whole thing reduces. Of course, that's what we're actually going to see. We're going to actually see expected value of Y given Z is Z naught plus epsilon. We've now broken that down and you notice this probability M of Z equals 1 when Z equals Z naught plus epsilon. For a sharp design, that's just going to be 1, but for a fuzzy design, that'll be something less than 1. We can decompose similarly right below the cut-off. We've just decomposed that. Then if we take the difference, remember that's something we can in theory see that actual difference on the left side, but now we're going to express it in terms of the treatment effect. We've got the expectation of the treatment effect when Z is actually slightly above the cut-off and you receive treatment times its probability plus the expected value of the treatment effect when Z is slightly below the cut-off and you receive treatment. Okay, that makes a lot of sense right? Plus the expected value of the Y0. That's when you don't receive treatment of course. We're comparing the Y zeros at slightly above the cut-off and slightly below the cut-off. Now this decomposition is very important. Now we want to take the limits as the epsilon tends to zero. You remember that we thought that the intuition going way back when was that units just to the left and right of the cut-off would have similar outcomes in the absence of treatment because they're starting out similar. Then we're going to make a continuity assumption which is the limit from the left i.e as you approach Z naught from below is equal to the limit from the right of the expectation of Y0 as you approach from the right. As you go up and as you go down, the limit is the same, which is of course you remember from elementary calculus is continuity. The regression discontinuity is in the probability of treatment at below and above the cut-off but before the treatment, you'd think "Okay, the units at the cut-off right to the right and right to the left of the cut-off are quite similar," and that's what motivates this continuity assumption. Okay, little more of this. For the terms in this equation involving beta i, those first two, note that if the treatment effects are constant beta i is equal to beta for all i. Beta can be pulled out of the expectation. It's then identified and it's equal to the difference in Y plus minus Y minus which is Y plus is the limit above and Y minus is the limit below as you approach Z naught. Similarly, M plus and M minus are the limits from above and below as you approach Z naught in the probability of being treated. The constant effect assumption is very strong but it is worth noting that the result above is obtained without assuming treatment is unconfounded and that's true in the sharp or the fuzzy design. In the sharp design, M plus minus M minus is just 1, but in the more interesting and realistic case where the treatment effects are heterogeneous, now we're going to have to make some assumptions about unconfoundedness. Let's assume unconfoundedness by which we mean here that Y1 minus Y0 is independent of the treatment you receive given your score. If we assume something like that, then you look at the next equation. Basically, we can take the M out of the expectation of beta i. Then again it would be reasonable to assume that the expectation of beta i given Z equals Z is continuous as Z naught. Under this assumption, the previous continuity assumption and the regression discontinuity assumption and the unconfoundedness assumption then the expectation of the treatment effect at the value Z naught is identified and it's equal to again Y plus minus Y minus divided by M plus minus M minus. You can see that this M plus minus M minus is kind of what's very similar to the effect of treatment assignment on the mediator when we discussed IVs and stuff. Of course, like there, it has to be greater than zero or less than zero but in general will be greater than zero in this context but it can't be zero. That's the point. It has an effect. Finally, I want to take up the case where Z plays a role of an instrument. Here, again, unlike the case we studied earlier which led to the case, the instrument is continuous. Now using the continuity assumption, the regression discontinuity assumption that's about the Ms remember right below and above the cut-off and the continuity assumption is about the Y zeros right below and above the cut-off, and the unconfoundedness assumption below which is similar to the unconfoundedness assumption in the Angrist et al article. We want to make the additional monotonicity assumption in the neighborhood of Z naught which is similar to the monotonicity assumption that they made in the Angrist et al. Hahn and his co-authors show that Y plus minus Y minus over M plus minus M minus is the complier average causal effect at the cut-off Z naught. That's the ATE for subjects who would not take treatment at Z naught minus epsilon but who would take treatment at Z naught plus epsilon. Now the rest of the Hahn paper is about estimation. How would you estimate this? They talk about using the locally linear nonparametric regression estimators. Now readers who want to take that next step and are interested in estimation, they should consult the Hahn et al paper or the more recent literature on this topic