Remember, that ANOVA consists of three steps. In the previous video, we talked about the first step, when to use an ANOVA and how to organize your data. In this video, I will explain how to perform a basic ANOVA. Reusing the example from the previous video. And in the next video, I will show you how to validate the conclusions using a residual analysis. Okay, we were studying the moisture percentage of batches of coffee and we were wondering if this is influenced by the machine the batch is produced on. Therefore, the moisture percentages is your Y variable or the CTQ in your project. The machine is our X variable or the influence factor. Moisture is a numerical variable and machine is a categorical variable. Using our tree diagram, we see that we need to perform an ANOVA test just to see if machine is a significant influence factor. These are the three steps of ANOVA. We will focus on the main analysis step. The goal is to study if the group means are identical for each level of your X variable. Which means that we will study if the moisture percentage is equal for each machine. This was our collected data for machine. Now, in order to perform this ANOVA step, we will go, of course, to Minitab. Please pause the video, load your data before continuing. This is what your data in minitab should look like. You have Machine 1, 2, 3 and 4 in the first four columns but the first step of ANOVA is to organize your data. And I have already done this by stacking my data into the column Moisture and Machine. ANOVA is a statistical technique so you can find it under the menu Stat. Under ANOVA we take the One Way option. Now, Minitab asked you, what is your Response? That's your Y variable or the CTQ? And that's of course Moisture. Now the next question is, what's Factor? Well, factor is influence factor or X variable and that's Machine. We also go to Graph and we ask for an Individual value plot. You can uncheck the other plots because we don't need them. Okay, under Options, we uncheck the Assume equal variances option. When starting this analysis, we do not know whether we can assume this or not. You can see the video test for equal variances for more information on this assumption. Okay, let's study the output. You will get an Individual Value Plot and you will also see that you have quite some output in your session window. The Individual Value Plot shows the measured moisture percentage for each machine. The blue line connects the group means. On first sight, we see differences in the group means, as the line is not horizontal. Machine 1 and 3 appear to produce coffee with relatively high moisture content. Now that, you will never get exactly horizontal lines even if the group means are in reality equal. Especially for small data sets, you should statistically test whether the differences between the means are real or due to random fluctuations. That is why, we look at the statistical analysis in the session window. The p-value indicates the chance that such difference occur due to random fluctuation. In this example, it is 1.2%. As the probability that this difference occurred by chance is so low, we conclude that the machines differ. In fact, the threshold is often sets to 5% and the p-value here is below this 5%. This means that the effect is significant and that it translates to the population. If the p-value would have been bigger that 0.05, it means you either did not find the real difference or you did not gather enough data to prove it. The statistical tests involving p-values are formally called hypothesis tests. For this ANOVA analysis the hypothesis that all group means are equal is our null hypothesis. The hypothesis that there is a significant difference between the group means is the alternative hypothesis. In this case, the low p-value suggests that this is very unlikely that the group means are equal. Thus, we can reject a null hypothesis and support the alternative.It is important to note that the p-value only tells you where an influence factor has a significant effect. This does not mean that it has a large effect. So, we also have to wonder whether the influence factor is relevant. Let's have a look at how relevant the effect of machine in our example is. We go back to Minitab. To see how relevant and influence factor is, we can take a look at the R-squared. The R-squared Always lies between zero and 100% and tells you how much of the variation in your Y variable is explained by your X variable. In our example, the R squared is 26% so the influence factor machine accounts for or explains 26% of the variation in the CTQ moisture percentage. Let's have a closer look at the R squared. Our data looked like this and we see that the machines have some effect on the moisture content. Now consider the circled measurements in the graph. There is a difference between these two measurements and it cannot be explained by the machine as they have been produced on the same machine. This is part of the other 74% of variation that is not explained by the machines but is as a result of other factors. Consider that this was the measured data. What would you think the R-squared would be for this dataset? Measurements are closer to the average, meaning that the machine explained more of the variation. The R-squared will thus be higher, 58%. We say that an influence factor is more relevant in the right example than it is in the left example. The R squared measures the impact of the influence factor on the dependent variable. If it is larger, we can say the influence factor is more important or relevant. it is a big fish. If the R squared is small this means the explanatory power of the influence factor is weak and that vital influence factors are missing in other words, this influence factor is a small fish. Summarizing, the p-value tells us whether the difference between the means of the machines are significant, meaning, that these differences are real and not due to random fluctuations. The R squared tells us how relevant the effect of the machines is. Always remember that ANOVA consists of three steps. And that before you can be sure that the conclusions in your second step are valid, you will have to perform a residual analysis. And, for that, see the next video.