So now, now let's take up the topic of Marginal Structural Models, which are based on an extension of the weighting approach in causal inference one that I presented. Okay, these models are called marginal because they're models for the marginal means or are the marginal means, given a set of covariates, covariates. And they're structural because they're models for the potential outcomes, for example, the model below. So now, let's recall from causal inference one, that when treatment assignment Z is strongly ignorable given covariates, that this weighting formula gives you the expected value of Y(1) and similarly we can get the expected value of Y(0) using this weighting approach. The weighing approach essentially what is it doing is adjusting the distribution of the covariates to be the same among treated and untreated units. because after all, our idea is that everybody could get treatment or everybody could get not treatment, okay? Which is basically what it says intuitively, a subject in the treatment group with covariates X1 and probability 0.1 of treatment. That subject represents one-tenth of the subjects in the population with those covariantes. So we need to up them by a factor of 10. And then units in the control group have a probability 0.9, and therefore we only have to up them by ten-ninths, okay? In this way in the treatment group, the one unit with covariant text one now represents 10 units and then 9 units with covariant X1 also represent 10 units. So weighting is used to create a pseudo-population in which both treated and control subjects have the same distribution of the covariates. Now, if we average the weighted outcomes in the pseudo-population treatment group, then we'll get expected value of Y(1). And averaging the weighted outcomes in the control group gives expected value of Y(0). Okay, so of course in practice we do not observe the population, and in additional, in an observational study, the assignment probabilities are not known and we have to estimate them. We want to remember that if the model for the probabilities is misspecified this will generally lead to biased estimates of treatment effects. Now, for continuous outcome Y, we can compute estimates of E(Y(1)) and E(Y(0)) using weighted least squares regression. Where the units with Z = 1, that is, the treated units are weighted inversely to the estimated probability of treatment given variants. And the units that are not treated or weighted inversely to 1- is probability. And then to take into account, the fact that the weights were estimated, we can use robust standard errors, okay? And if we had a dichotomous outcome, we might use logistic regression or probit model instead. Now, in the foregoing, we have used so called unstabilized weights, had we used stabilized weights, and you'll notice that we have probability z = 1 and probability z = 0 in the numerators instead. But had we used these weights in estimates thereof, we would get the same results because model for y is saturated. Now, in the more complicated setting to which we now turn, it is usually necessary to model the outcomes using unsaturated models. Then it turns out that it’s better to use stabilized weights as these lead to smaller confidence intervals for estimated parameters. The results above extend to the longitudinal case under the ignorability and positivity conditions. You can look at that formula and you can see that basically I'm multiplying these weights up through time t. Robins and Hernan called these weights unstabilized, and the stabilized weights are the ones where you have the additional term in the numerator. So in order to estimate the expectation of Y1 at z1, this little z1, one proceeds as before weighting inversely by an estimate of the stabilized or unstabilized weight. Now at the next step you want to estimate E(Y(z1, z2) as the treatment sub regimen. And the units observed in period 2 with Z1, the random variable Z1 = z1 and the random variable Z2 = z2. So you have to reweight these units so that the observations with covariates x1 and x2 have the same frequency as the observations with the same values of the covariates, but with Z1 = z1* and Z2 = to z2*, etc. And as before, you need to estimate the weights and as t increases, it will generally be necessary to model the outcome as well, for example, this structural model below. So clearly, our previous concerns about misspecification above for the case T = 1 may apply here as well, but they apply more forcefully as the opportunities for misspecification increase because you're doing the weighting more times. I haven't done it but you can include baseline covariates W in the marginal structural model. And then we're going to do that, Hernan and Robins, the book recommends using modified stabilized weights as these leads to smaller standard errors in unstabilized weights. As both the g-formula and inverse probability treatment waiting can be used to estimate this treatment effect up through time little t, comparing sub regimen Zt and Zt* or baseline core variates as well. But it's useful to look at the advantages and disadvantages associated with each approach. As we noted before, using the g-formula would generally require modelling this expected value or the conditional expectations for yt. And it will usually require modelling a probability functions, for the covariates given prior covariates and prior treatments. So both of these can be quite challenging. Now, using IPTW requires modelling the assignment probabilities, conditional on past covariates assignments. And although it appears that IPTW does not require modelling the outcomes, remember that as t increases, the number of treatment regiments increases exponentially and then you're also going to probably need a model for the outcomes. Now, if you have a sequentially randomized experiment, the treatment probabilities are known, thus using IPTW, it may only be necessary to model the outcomes, especially if t gets large. In such a situation, IPTW would be preferred, as the g-formula would also require modeling the distribution of the time varying confounders. And if there's a lot of confounders, that could be really, really difficult. But in an observational study, the assignment probabilities are unknown and you need to model them. And in this case, misspecification of the model for the assignment process can lead to very biased estimates as can misspecification of the outcome model. And you can have doubly robust estimators, I preferred you, if you're interested, to an article by Bang and Robins. In an observational study, a good strategy might be to use both approaches and see if the estimates obtained agree adequately. While such an agreement doesn't actually indicate either or both approaches led to a good estimate, the agreement does offer some grounds for reassurance. Certainly, if you don't have agreement then that would be a cause of concern. Now more generally, it should also be remembered that in observational studies, as versus sequentially randomized experiments, the analysis is also predicated on the sequential randomization assumption, which isn't really testable. Marginal structural mean models can be estimated in both SAS and Stata. In the next lesson, we are going to discuss another method, so called g-estimation for estimating treatment effects using structural nested mean models in longitudinal causal studies.