In this module, we're going to talk about natural and quasi experiments as one way of trying to establish a causal relationship between two variables. The idea of a natural or a quasi experiment is to find a situation where we have a population where we can observe a relationship between X and Y. Perhaps we're interested in stress during pregnancy, so we think of stress during pregnancy as our X, and Y is the outcome of the pregnancy. This is a common topic. People are very interested in whether stress during pregnancy affects the outcome of the pregnancy. Now one of the problems we have with studying that is that there are a lot of variables out there, a lot of Ws that might affect or lead to stress during pregnancy, and also might affect the outcome of the pregnancy. So for example, an expectant mother's own personality characteristics, their economic situation, other things might affect both the sort of stress that they experience during pregnancy, but then might also affect the outcome of the pregnancy. The idea of a natural or a quasi-experiment is that we try to find the population where we can make some observation, but we look for a population where we can divide the group into two according to whether some sub population experienced a shock that affected the X variable, for example, in this case, induced stress for expectant mothers and then we can see whether or not it leads to a response to Y. Here, the variation in X is exogenous, created by a shock, some kind of stressor that then we can look at whether or not it lead to a change in Y. And we detect the effect of that exogenous stress, that exogenous change in Y, by comparison with another sub-population which did not experience the same shock. There are many sources of exogenous variation in X variables, so right hand side variables, that people carrying out studies using natural or quasi experiments exploit. One as we just alluded to in the previous slide and we'll come back to, is natural disasters. So if we're interested in the relationship between stress and health, now usually, there's all sorts of things that could affect both stress and health. We're worried about omitted variables. But if we think about natural disasters, there are examples we can identify where a natural disaster like an earthquake, might generate a lot of stress for people without actually affecting, at least not directly, their health very much, assuming that they actually survived where we're not injured by the disaster itself. Another common source of exogenous variation is policy changes. So there are examples of changes in the law that affect things that we're interested in as X or right hand side variables and whose effect we want to measure and these lead to changes that have nothing to do with individual characteristics. So for example, people have sought to measure the effects of education on health and other outcomes by looking at situations where law regarding, say, minimum amount of education changed in a very direct or discreet way, and then this affected people's education, the amount that they received with little regard for their own individual characteristics. So if you see differences between people who received different amounts of education as a result of the policy change, then you know that it's actually an effect of education, and not something else. Plant closings. For people that are studying the relationship between unemployment and health and other outcomes, now obviously, there's all sorts of individual level factors that might affect both the chances of people could lose their jobs and then it might affect other things like their health or their stress or other outcomes. But if we think about plant closings, they may be exogenous to the characteristics of the individual that experiences them. They occur regardless of the individual's own characteristics were affected. So if you think about plant closings as a source of exogenous variation in the chances of being unemployed, you can exploit that to look at the effects of being unemployed. Some people have looked at the effects of lottery winnings, to try to understand the effects of income on different outcomes. So again, income, we're very interested in it's effects on all sorts of outcomes. The problem is that there's all sorts of things out there that affect people's income and all sorts of other outcomes that we may be interested in like their health or other types of behaviors. Now we tend to think of lottery winnings, hopefully, as being random. So if people differ in their incomes or their wealth because some of them won the lottery and some of them didn't, then that should be, again, a source of exogenous variation in their income or their wealth, and therefore, we should be able to measure the effects on some outcomes that we're interested in. Let's go back to our example of stress and pregnancy outcomes. This is a topic that we're very interested in, is whether maternal stress affects the outcomes of their pregnancy. The big problem we have is that there are an endless number of other factors that we think might both influence the amount of stress that a mother might experience during her pregnancy, and might have their own direct effects on pregnancy outcomes, for example, family disruption, economic shocks, and so forth. These are things that all might be affecting both the amount of stress that a woman experiences while pregnant, but then in very different ways, might actually have direct effects on pregnancy outcome by, for example, affecting the mother's diet and so forth, which in turn affects the pregnancy. Well, what people have done is look for situations where stress is possibly the result of some exogenous shock, in the case of an earthquake, and then treated the stress that's associated with the occurrence of an earthquake, and looked at that in terms of it's pregnancy outcomes. So they basically compared people who were affected by an earthquake with otherwise similar people who were not affected by the earthquake, who in fact differed only in terms of the fact that they were not affected by the earthquake, to see if there was a difference in terms of the pregnancy outcomes. To see how this works, let's look at this in a little bit more detail. So if we have, say five different mothers who conceived at different times, and we're specifically interested in whether or not stress at a specific period identified here in red during the pregnancy has a particular effect on the outcome of the pregnancy. There's a lot of good theory as to why stress experienced at very specific points during a pregnancy might have outcome effects, that is, might alter the outcome of the pregnancy. So we might look in a population, and then we, say, look at mothers who gave birth over a period of several months. Now, if we work backward, and there was some shock, a discrete shock that affected the mothers in this population at a specific point in time, a sub sect of those mothers, their pregnancies might have been in this crucial phase that we're interested in at the time of the shock, while other mothers, their pregnancies were not in this crucial phase. So essentially, we get our control and our treatment by comparing the women who according to the timing of their births or their timing of the conception, would have been in the critical phase that we're interested in, at the time of the earthquake, the external shock, and then comparing them to mothers who game birth just before and after, therefore, who would have been otherwise similar except that their pregnancy was a little bit earlier or a little bit later. So there we have our control and our treatment. We assume that whether or not women are part of the control group, or the treatment group, is largely random in the sense that the actual timing of the conception has a large random component. Now when we're looking at say, policy changes, what people often do is they'll look at situations where say, policies were introduced in such a fashion that they introduced different subgroups of people at different times. So if in some cases where people look at the effect of a single nationwide policy change, for example, the effect of an increase in the minimum amount of education for a country, then they might compare people who were on either side of the policy change, and the assumption being that if say, people were just a little bit older or a little bit younger at the time that a policy changed, they would be otherwise similar on a lot of the differences that we might be worried about in terms of a source of limited variables. So in some cases, people doing studies of the effects of education, they look for situations where laws about minimum education affected everybody up to a specific point in time, for example, everyone born before a certain date like December 31st of a certain year, and then everyone born after that date, say from January 1st onward, they're subject to a different set of laws. Then we might imagine that the people born on December 31 or January 1 actually wouldn't have differed all that much except that by luck of the draw, some of them ended up being born before the policy change that affected the people, some of them were born afterward. Now in other situations, for example, it's common in the United States where policy changes are made state by state, we can get even more detail because we can make these sorts of before and after comparisons of the people that were not affected by the policy, not subject to the policy with people that perhaps because they were born a few days later or experienced some other difference, were affected by the policy. And then because the policy, this may been introduced at different times and different occasions, for example, different states, then we can make multiple comparisons in different settings to help figure out whether or not the people on one side of the policy change really systematically differed from the people that were on the other side of the policy change. So, this has been used in a variety of settings where there were, for example in the United States, laws that changed in a state by state basis at different points in time, for example, laws legalizing abortion, changing access to guns and so forth, these again vary by state, and then people do before and after comparisons. Now there's some strengths and limitations that we have to worry about when we're thinking about natural and quasi experiments. First of all, natural and quasi-experiments, they may very well have very high internal validity. So within the population that's at some level the subject of the comparison, then we may actually have a pretty good argument that the exogenous shock, maybe it's a natural disaster or the policy change, really did lead to a change. Now just to remember, internal validity refers to the fact that we can actually show that for the population targeted by the study, that the change in X really did change Y. However, there are arguments that quasi or natural experiments may have limited external validity in the sense that it may be difficult to generalize the results away from the specific sub population or population that was the subject of the study. So for example, effect sizes may differ, and so forth. The concern is that the effects may have been related to the context of the population in which the study took place. So it maybe that quasi and experimental studies are only practical in a limited number of situations.