So, in contrast to the g formula in IPTW, which we used to estimate effects such as the average treatment effect of sub regimen ZT versus ZT star or same thing conditioned on covariates, G-estimation is used to estimate treatment effects conditional on past treatment and confounder history. So, for example, in a special way. So here, we're looking at this estimand, we're looking at time T plus K, K could be zero or it could be anywhere up through cap T minus t, and so what's happening here is, you look at the first thing. So, what's happening is that we have treatment regimen Z_t and then in the second, we have Z_t minus one, and then after time t, you're always off treatment in the first one, and after time t minus one, you're always off treatment in the second one. So, these sequences are comparable except that they can differ in time period t, and then we're going to condition on the covariates up to time t, and also treatment assignment up through time t. If you look at that guy, you thinking, at time t, in the first one, you get a treatment and a time t obviously in a second Y, you don't get treated, and now we're going to condition on you having value Z_ t in time t. It's like a sort of similar to an effective treatment on the treated in a certain sense. We'll get more about that later, and you're conditioning on the covariates up to time t, and we're going to call that Gamma t k, and we're going to call that of little x t and z t, and then this Psi. Now, the Gamma t k is a known function of x t and z t that depends on parameters Psi and Psi star is the true value of these parameters. So, in some instances, the outcomes only measured in period t or interest resides only in the effects at the end. So, for simplicity, I'm going to focus on structural nested mean models with effects defined on a different scale, but G-estimation can be used to estimate treatment effects on other scales. For example, odds ratios, and in addition, you can have other kinds of structural models along these lines for distributions. For example, that can be defined in estimated using G-estimation. There's a nice article in statistical science by Vansteelandt and Joffe upon which I'm going to draw very heavily. But you should see them for a more general treatment, and I warn you, this is probably the most complicated stuff we're going to see the entire course. So now, each estimand, this is the formula we previously had is the effect in period t plus k of treatment in period t followed by no further treatment versus no treatment in period t, also followed by no further treatment for units with covariates X t that actually fall on the sub regimen ZT through period T. That's a much more succinct and nicer way of saying it than I said it before. But there's a lot there. So, for example, if cap T is two and then little t is one, these would be the effects that are defined and you can get them. You just plug them into the formula above. The structural nested mean models is structural because again, they're models for potential outcomes, and they are nested because in each period, the effects are conditioned on the history prior to the current treatment. You can see that, because we're conditioning for an effect in time t, we're conditioning on covariates up to that time in treatment up to that time. So, when Vansteelandt and Joffe described various types of such models including structural distribution models, structural models for survival data, etc. They also provide a nice treatment of estimation. I'm going to follow their exposition, but only consider the most elementary issues. As I said, this topic is more difficult. It's difficult enough you'll see for the elementary case I'm going to follow, and actually readers who aren't especially interested may just want to skim this section. Any reader that is very interested, say wants to use this stuff or really learn it. Well, they're just going to have to pursue the topic by consulting the Vansteelandt and Joffe paper and the references there in and also the Hernan and Robins book. But this stuff is harder. So, let's just start with the case T equals one, gets motivation there, and then Z one is one pre-treatment and Z one is zero if not. So, structural mean model would contrast treatment with its absence amongst objects with covariants X one. So you can see that, this is the same form as before. But just for this one case. So now, the Gamma star just depends on X one and Z one, and this Psi star. Psi star, what do we want that to be? Well, if little z one by they have to be expectation is zero. Well, we ought to want Psi star to be zero and Psi is going to be a vector parameters with true values Psi star. So, the Gamma star of X one z one in Psi star, for example might equal Psi star times Z one. For the case where the effect doesn't depend on the covariates for example. But you can see here that the structural mean model is thus a model for the effective treatment on the treated at X one value of the covariates. So, we considered this in causal inference one, and recall that to identify this parameter, we only needed to assume that Y one, zero is independent of Z one given X one because we observed expectation for Y one condition on X one and Z one was actually equal to the expectation of the potential outcome Y one, one given X one. So, we're actually seeing what we needed. So, we didn't need to make a x-bar identification assumption about them. So, we also considered various ways to estimate this parameter. I inverse probability treatment weighting matching. So, the thing is these procedures aren't going to extend very well to the more general kinds of estimands that we're considering now. So, we need to estimate these parameters. So, here's how we do. We define a variable U star Psi, which is Y minus this Gamma star. Then it follows, you write it out by going back and looking at what Gamma star is and then conditioning on X one and Z one equals Z one. So, it's going to follow that the expectation, the conditional expectation of U star Psi star will be equal to the expectation of Y zero condition on X one and Z one, and then using non confoundedness condition is going to give me that that conditional expectation conditioned on X one and Z one is equal to the conditional expectation just on X one. You see that because we're going to use that to set up an estimation scheme. So, the consequence of this, Psi star can be estimated using estimating equations. So, assuming a sample of size N, we set these to zero, and basically these stars are a vector of known functions with q components because we are going to estimate q parameters here, and I'm giving you an example for that case we saw before where the Gamma star was equal to Psi star Z, d would be one dimensional, and you might take d of Z one x one to be Z one. So then, d star minus X expectation would be d star minus the expectation Z one and that should be fairly reminiscent of what we did back in causal inference one. Those who aren't familiar with the estimating equations approach, you can look at the text by Bickel and Doksum on this subject. They have a nice treatment. There's certainly nice treatments to other places as well. So, to implement this approach, Vansteelandt and Joffe point out that it's necessary to model the propensity score, which is needed of course for the expectation of d given Z one and X one and also expected value of U star given X one. So, we need to model the propensity score and then in their paper, they also discuss efficient estimation and double robustness, which is more than we want to get into here. To estimate the effects, so we know we're going to use Z estimating equations. So, to estimate the effects more generally in this general estimand that we started off with, the idea is to extend the treatment above. So, for all time periods, you define these variables. You take k star of Psi and we define them, so that in conditional expectation, they are equal to the y t plus k z t minus one, zero, which makes sense. You can see that that extends what we just did and then using the sequential randomization assumption, this will reduce to the expectation given x t, etc, etc. Then, we can use this to construct estimating equations. Vansteelandt and Joffe have further details.