[MUSIC] Statistical interaction describes a relationship between two variables that is dependent upon, or moderated by, a third variable. >> For instance, do you prefer ketchup or soy sauce? Obviously, your answer depends on what food you're eating. If you're eating sushi, you probably prefer soy sauce. If your having a burger and fires, you're probably going to want ketchup. [MUSIC] >> In this case, the third variable is referred to as the moderating variable, or simply, the moderator. The effect of the moderating variable is often characterized statistically as an interaction, that is, a third variable that affects the direction and or strength of the relation between your explanatory, or x variable. And your response, or y variable. What if the population we're studying has different subgroups? Could it be that, like the soy sauce ketchup sample, different subgroups could have a moderating effect in our association of interest? >> To explore this idea, we're going to use a hypothetical study and some made up data. In our imaginary study, we're looking at two diets and their effects on weight loss. Diet A is a Low-Carbohydrate plan. Diet B is a Low-Fat plan. Our hypothetical study also recorded data on which exercise program participants chose Cardio-Vascular exercise or Weight Training. >> Our variables of interest are diet or weight loss. We've added this third variable, Exercise Plan, to help us understand moderation or statistical interaction. So what's the association between Diet plan A and B, our explanatory variable and Weight Loss, our quantitative response variable. This table shows our hypothetical data showing diet, weight loss, and exercise plan. Since we have a categorical explanatory variable, diet plan A or B, and a quantitative response variable, that is weight loss. We will of course need to use Analysis of Variance to evaluate the association. This model's SAS syntax should look familiar to you. Following PROC ANOVA and our class statement, we'll include our categorical explanatory variable, and then our model statement will include the quantitative response variable equal to the categorical explanatory variable. Finally, our categorical explanatory variable will be included in our means statement. In this example, the syntax would look like this. The resulting output for this analysis is shown here. As you can see, we're testing the association between diet A and B and weight loss. There are 40 observations in the data set. The f value is 12 and is associated with a significant p value. That is, a p value less than .05. While this tells us there is a significant association between diet type and weight loss, to understand that association we need to look at the output generated by the mean statement. Here we see that the average one month weight loss for diet A is about 14.7 pounds. And that the average one month weight loss for diet B is about 9.3 pounds. So, in conjunction with the significant P value, we can say that diet plan A is associated with significantly greater weight loss than diet plan B. Here we show the finding graphically, as bar chart, with diet, the explanatory variable on the x-axis. And the mean weight loss, our response variable on the y axis. What about a third variable, exercise program? Would we get the same results in the association between diet and weight loss between those participants using cardio and those participants using weight training?