I want to give a very short treatise or something that is actually quite complicated but I just want to give you a conceptual sense of something that can be done with the results from Cox regression. If we add a third term to the course we could explore this in more detail with more examples. So I want to show you that we can back compute survival curve estimates from the Cox regression results and present these graphically. So what I want you to get out of this short lecture section is to know at the end of this that the results from any Cox regression model can be translated into estimated survival curves for groups with different X values. So we can plot them separately depending on the value of X. I want you to appreciate the concept of how this is done. So suppose I have the results from Cox regression and given any value of X and any time value, I can compute the log hazard of the outcome for that group at that time using the resulting Cox regression equation. Could I get and translate at every given time for every given group of X this back into an estimate of survival curve from the log hazard scale? The short answer to this is yes it can be done but it's mathematically involved. Further when X1 is continuous it's not generally possible to display the estimated survival curves for all possible values of X1 in a sample even if the curves can theoretically be estimated. So for display purposes several values of X1 can be specified and the survival curves can be displayed specifically for those values. So let me give you the longer answer and it's going to involve a little math that I don't expect. If you're not familiar with it, don't worry, I just want you to appreciate the concept. But for those of you who like math, I figure I would put it on here just for FYI. So the longer answer to this is we know that what the end result of Cox Regression from a users perspective is this equation, the log hazard of the outcome is a linear function of some function of time, our intercept that depends on time and a slope times a given value of X1. So for any given value of X1 at any specific time, these two things can be computed and we can come out with a single number. Using the computer, we can figure out what the log of the estimated baseline hazard is evaluated at a specific time and then we'll have a numerical value of beta one hat and a value of X1 and we'll sum these up and get a number. Once I have that number, I can translate the estimated log hazard for that group at that time into an estimated hazard at that time for that group by exponentiating the result. That will give me the instantaneous estimated risk of the event occurring at that time for that group. What I can then do is survival as a function of what's called the cumulative hazard up through a given time T. So group survival beyond a given time T is a function of how much hazard they've accumulated from the start of the study up through that specific time T. So at any time T, the cumulative hazard can be computed by the following formula. It's the integral from zero and time T of the individual time-specific hazards for that given group integrated across the range of times. Integration is like summation. So you can think of the cumulative hazard as being the cumulative sum of all time-specific hazards for any given group with their X1 value up through and including the time we're looking at. Then survival is a function of the cumulative hazard in the following form the estimated survival or the proportion in a given group given by its X1 value surviving beyond the time T we're looking at is related to the cumulative hazard because it's the exponentiated value of the negative cumulative hazard. So let's just step back and not think too mathematically for a moment but let me ask you this, what does this tell us in the end result this function E to negative HT? The more as cumulative hazard increases, the more cumulative hazard incurred by a group, what happens to E to the negative of that cumulative hazard. We're raising some number to a more negative value and so this survival the larger the cumulative hazard as cumulative hazard increases, the survival decreases. So that makes some sense, right. The more risk that one or a group accumulates over time, the less likely they are to survive beyond that time. So that's the relationship. So, we could do this or the computer could do this across all different time points for all different groups to get estimated survival curves across the follow up period for each of the unique groups defined by X1. Let me just show you an example of this and we're going to compare these results to what we get if we use the Kaplan-Meier approach to estimate survival for the different groups defined by our X variable. So let's look at the gestational age and mortality data we had from the Nepalese children and recall for purposes that we discussed, the ultimate decision was to categorize gestational age because the relationship between the log hazard of mortality and gestational age was not strictly linear. We saw it dropped off quickly going from pre-term to full term and then there wasn't as much of an impact thereafter in terms of gestational age and effect on survival. So we've got five groupings preterm are less than 36 weeks, 36 to 38 weeks, 30 to 39 weeks, 39 to 41 weeks and 41 plus weeks. So let me show you, I'm only going to put two of the five groups on here just so we can see what's going on the graph but when I'm showing on this graphic, is the solid red lines here are the curves based on the Cox regression going through that computation I talked about before, obviously done by the computer, the estimated survival curves from that Cox regression for these two gestational age groups the preterm are those with 32 to 36 weeks in the first full term group 36 to 38 weeks. The dotted darker lines, the black lines are the estimates from the Kaplan-Meier approach. You can see at least on this particular example, on this graphic, the estimated cox curves and the Kaplan-Meier curves are similar. One big distinction between them is the underlying shape of the hazard that created these two Cox curves is the same for the two groups because of the proportional hazards assumption. So these two curves borrow information about the shape, they both use that information based on the underlying shape that we saw in the original reference group of 32 to 36 weeks whereas the Kaplan-Meier estimates are independent of one another, they do not assume or borrow information from each other in forming those shapes. So that's one big distinction between the two but in many situations quite frankly, the estimates based on a simple Cox regression model and those with categorical predictors at least for some other categories the estimated survival curves and those based on Kaplan-Meier curves will be similar in nature. Another thing we can do, you may recall in the last section I said there is an uncertainty component to the intercept part of the Cox regression model, the log of that function lambda hat naught of T, we looked at the uncertainty component in the slope but I also said there's an uncertainty piece in this that is figured out by the partial likelihood algorithm that finds the best fitting model, and this is incorporated. We can get confidence limits for the survival curves based on the Cox regression estimates. Because the survival curves are a function of both this intercept and potentially the slope or slopes, depending on which group you were looking at, the uncertainty in the survival curve estimates uses the function of the uncertainty in both those and so what I can show here on this graphic is here as the Cox regression based survival curve for the pre-term group, the Cox regression based survival curve for the 36 to 38 week gestational age group and then these are confidence limits, these are confidence bands at any point on any one of these curves, if we went up straight up that would be the upper limit on the proportion surviving beyond the given time in that particular group and this would be the lower bound. So you can see distinctly. Here and we could have gone back to the Cox regression and looked at the inference on the hazard ratio comparing the 36, the 38 week group to the reference pre-term group and we will see it was a statistically significant difference but here you get a visual cue of that because the confidence limits on the two respective survival curves for the two groups do not overlap at all. So this was a very quick treat. Is to just to let you know that these Cox regression models we aren't just stuck with relative estimates based on the hazard ratio, these can be translated back into absolute proportion estimates via the survival curve estimates and that's something that you can expect to see in papers that use Cox regression, they may present for simple analysis maybe the results from the Kaplan-Meier, maybe the results from the Cox but what we'll see moving forward in this course is when we have a Cox regression model with multiple predictors, we can get Cox based estimated survival curves taking into account different combinations of multiple predictor values. So the results from Cox regression do yield an estimated hazard ratio or multiple ratios if we have a multicategorical predictors and 95 percent confidence interval, which allows for the quantification of the relationship between a time-to-event outcome and a predictor on the relative scale as I was saying before. With the aid of the computer and I make it sound like we wouldn't need the computer for any other parts of this but of course, we needed to do the entire regression estimation process from start to finish. So with additional aid of a computer or with the additional aid from the computer, the resulting Cox regression can be used to predict estimated cumulative survival curves as alternatives to those estimated by the Kaplan-Meier method. In many situations, the resulting method will produce similar estimates but the Cox based curves are constructed under a proportional hazards assumption whereas the Kaplan-Meier curves or not. So this is another utility of this method for back constructing survival curves based on models fitting under a proportional hazards framework. So again when we have this individual level data available, we can estimate these Cox regression models and we can actually translate these back into estimated survival curves and this is important and especially when we get into the realm of multiple regression, we'll see how useful this is for describing the time to event experience unfolding over time for different groups defined by different X values. These are all estimated under a common assumption that the general shape of the trajectory of the events unfolding over time is the same regardless of the groups we're looking at, that's the proportional hazards assumption.