I am sitting in a medium sized classroom. The sun is shining through the window on my left. And around me are sitting about 20 other students. The professor in front of the group is telling us a story about this man. It is this story of this man that made me realize that I really like statistics. For me, his story shows me that statistics is about solving daily problems and not about abstract formulas and mathematics. So I really want to share this story with you as well. This man is called William Sealy Gosset and he lived in England around 1900. He studied chemistry and mathematics and upon graduating he started to work for the Guinness Brewery. Gosset applied his statistical knowledge, to select the best-yielding varieties of barley. He used trials. And as it is obviously very time consuming to produce barley, he could do not too many trials. Therefore, he was faced with only very small samples. To deal with this, he developed a special test to be able to draw conclusions from small samples. And to be able to determine which barley had the best use. This test is now widely known. It's the two sample t-test. For confidentiality reasons Gosset wasn't allowed to publish the result under his own name. Therefore, he was forced to publish his work under the pseudonym Student. And this is why the two sample t-test is now also known as a student's t-test. For me the story shows that statistics is used to solve problems people encounter in their work and who knows, maybe in your life as well. The two sample t-test is a method to compare means of two groups. And this is the learning objective for this video. To stay in the spirit of Gosset, let's take a look at an example from agriculture. Imagine you are growing tomatoes, and you wish to maximize the yield, and that is the total amount of kilograms of tomatoes that you are harvest. For this, you experiment with two different fertilizers. Let's call them fertilizer A and fertilizer B. You ask yourself, which fertilizer should I choose? Which fertilizer produces more tomatoes than the other? In order to answer this question, you set up a small experiment. How would you do that? A possible way is to take a field and divide it into 20 pieces, and plant tomatoes here. Then select ten pieces randomly and put fertilizer A on them. You put fertilizer B on the other ten fields, or sub fields. Next you wait until tomatoes are grown. You harvest them and you measure the yield for each sub field. And this could be the data that you get. Which fertilizer will you use? Well, if you want to practice with the material, you can load this data into Minitab and try to make graph. Don't forget to pause the video because you'll get the answer in the next slide. Ready? The graph you could get looks something like this. Of the two fertilizers, it's fertilizer B that results in a higher yield on average than fertilizer A. This is confirmed by the mean. The mean yield of fertilizer A is equal to 5.66 and the yield of fertilizer B is equal to 8.4 which is quite a lot higher. So fertilizer B is better, right? However, we have just measured ten fields for each type. Do you think that this difference in mean is a coincidence or will it be consistently better to use B? Would you be confident in concluding that B has a higher production? While this is where statistics will help you. You can use statistics to determine if this difference in this small sample is a coincidence or whether it's not. As we have two groups here, fertilizer A and fertilizer B, we will test this with the two sample t-test. So, let's do that in Minitab. I have loaded the data in Minitab and you see two columns. One for fertilizer A and one for fertilizer B. Now, to perform a two sample t-test, you have to go to the statistics menu, the Stat menu. Under Basic Statistics, you will find the 2-Sample t-test. You have to see how your data is organized to be able to choose whether you have both samples all in one column, or each sample in its own column which is the case here. Sample one we select fertilizer A, and sample two we select fertilizer B. Okay. Your Minitab session window should look something like this. What do you think? Is the fertilizer B still better fertilizer A? Well we see the means again, and we can now also see that the variation is a little bit higher for B than for A. Let's focus on our means. How do we determine whether the difference is truly significant, or maybe just a coincidence? For this we have to look at the difference in the means and the P-Value. The estimated difference between the mean of fertilizer A and B is 2.74. Now let's take a look at the p-value. The p-value is 0.015. But what does this mean? The p-value expresses a probability. The probability that given all the assumptions, the means of the two groups are equal to each other. So, the probability that the means are equal to each other. In this example, this probability is 1.5%, it's relatively small. So given this data of 20 fields, we can conclude that there's only a slight chance of 1.5% that the influence factor fertilizer has no effect on yield and that the means are equal. Hence, if we put it in more normal English, the p-value is smaller than 0.05 and we conclude that the fertilizer has a significant effect on the yield. Now, to actually get some knowledge out of this, we have to actually go back to our data and look at the mean levels. Because if we now know that the fertilizers are different, which is the one that we have to use? Well, that's obviously the one with the highest mean, and that's fertilizer B. Let's summarize this. The two sample t-test is a very basic test to analyze the differences between means of two groups. The p-value can be used to calculate whether the differences of the two means is coincidental, or truly significant. Especially for small samples this is very useful. The two sample t-test is a very old test and forms the building block of many other tests that have been developed since.