[MUSIC] In week two, you may recall that we introduced some simple probability distributions. And towards the end of week two, we introduced the concept of a parameter. And we saw that different families of probability distributions had various parameters associated with them. Well, I mentioned this week is all about statistical inference, particularly the estimation branch of inference. So how do these two sort of interconnect? Well of course, in week two, if we knew the true theoretical probability distribution, it was possible to derive things like the expected value of the random variable. The expectation of X. For which we were typically denoted by the Greek letter mew, but what if we didn't know the true theoretical distribution? Suppose we only observe a random sample of data drawn from this wider population, so we know that perhaps these parameters exist but we may not know the true values of these parameters and hence, the need for estimation. Whereby, we are going to infer, i.e., estimate the values of unknown parameters related to a wider population, which we don't know. Based on our observed, random sample of data, which we do know, and we do observe. Of course, though whenever we estimate something there is a chance we may be wrong and hence there's going to be some uncertainty in the estimation which we conduct. So in this section, we're going to consider the concept of a sampling distribution. Now, this may seem a little theoretical maybe a little abstract. But don't be deterred because sampling distributions are vital in the use of statistical inference methods. So to illustrate this concept, let's take a very simple example. Imagine, we have a population which consist of six people. Now, on purpose here having a very small population just for illustrative purposes. So remember we used capital N to denote our population size. So let's say here capital N is equal to 6. And let's say we label these people A, B, C, D, E, and F. And let's suppose that the characteristic of interest is perhaps their monthly income. And for A to F, we observe their monthly incomes as shown. And let's say these are in thousands of pounds. Now, interestingly, you might think these seem quite large monthly incomes. Well, given that the median income of people in the UK is around about 26,000 pounds a year, indeed, these do seem very high as monthly income figures. But, of course, here we are looking at people who have statistical training. And, of course, those with a quantitative backgrounds tends to command a higher salaries as a great incentive for why you wish to pursue your study of probability and statistics. So in this case, given we actually we have data on the entire population, of course, it is possible to calculate the population mean. Namely, we'll take the average of all of these monthly incomes, and for a fact, we know for this very small population, that the average monthly income is 6,000 pounds. But imagine, we didn't actually know this figure because we didn't bother looking at the entire population. Now, I agree, if your population only had six individuals within it, conducting a sense this would be a very feasible thing to do. But to illustrate this concept of a sampling distribution, let's imagine we don't wish observe all six or take a sample of size two. So here little n is equal to 2. Now, we did speak before briefly about the differences between sampling with replacement and sampling without replacement. Of course, if we sample with replacement, we could potentially get the same individual more than once, and hence a duplicate observation. Suppose we don't want to have that here, given the very small size of our population, and hence if we did replace a fairly high probability by sampling standards of getting a repeat observation. So let's suppose we will opt to sample without replacement. So how many different samples, random samples, are possible from this population of size six? If we sample without replacement with samples of size 2, when in fact there were 15 distinct possible samples which could be selected. So treating these as simple random samples, whereby each of these 15 samples is equally likely to occur, we are now in a position to build up our first simple sampling distribution. So if, let's say, the sample we observed happened to be individuals A and B. So here, what is the sample mean? So remember that simple descriptive statistics from week three of this MOOC. Well, to calculate a sample mean, we just add up the observations and divide by the number of observations. So here, individual A has a monthly income of 3,000 pounds and individual B has a monthly income of 6,000 pounds. And a simple average of those 3,000 + 6,000 divided by a sample size here of 2 would give us a sample mean of 4,500 pounds. And, of course, we are in a position to calculate the sample may not just for that sample of A and B. But all of those 15 possible samples. Now, clearly, as we have different individuals as we go from one sample to another and some variation in the monthly incomes across all six individuals, clearly, it's of no surprise when we get different sample means depending on the observe sample that we have. So with this in mind, we are in a position to build up our sampling distribution. Now, keep in mind that a sampling distribution is simply a probability distribution. So thinking back to week two, how do we define a probability distribution, where we consider the sample space all possible values of the variable. And then we subsequently attach the probability of occurrence to each of those reflecting just how likely they were to occur. Well, here, this is a no different from that concept in week two other than here we're looking at the distribution of a statistics. Specifically in this instance, the distribution of the sample mean X bar. So contrast that with those probability distributions from week two, when we consider the distribution of some random variable X. Well, here, we want the distribution of X bar rather than X. Whereby, the sample mean is being used in a special capacity and that we will take the value that we observe for our sample mean, and use this to estimate the value of the population mean the key parameter of interest. Now, of course, depending on the sample we observe sometimes we get a sample mean which is very different from the true mean and sometimes we get a value which is quite close to it. Indeed, if we look at the different sample means across those 15 possible samples, each of size 2, drawn from this population of size 6, we see that the sample means range from 3.5, which remember here represents 3,500 pounds as a monthly income, to as high as 8,000 pounds. So for example, if we had samples of D and E, and D and F, they both lead to sample means of 8,000 pounds which remember is above the true population mean monthly income of 6,000 pounds. Indeed, of these 15 samples, only one of them has a sample mean which is exactly equal to the true population mean of 6,000 pounds. And that would occur if we observed the sample of individuals A and D. Now, of course, it's possible that we would get that particular random sample, but of course, it's also possible, and far more likely that we will get 1 of those other 14 possible samples. So to build up this sampling distribution, we now have our sample space. All possible values of X bar, and now we will attach probabilities of occurrence to each of these, reflecting how likely they occur, i.e., how frequently we get these various values in each of those different 15 samples. So in the case of the average monthly income of, let's say, 3,500 pounds, when we get individuals A and C in our sample, we see this only occurs once out of those 15 possible samples. So 1 in 15 equates to about 6.7%, and hence, that is the probability we would assign to getting a sample mean, in this case, of 3.5. Of course, we saw 2 instances when we got a sample mean of 8,000 pounds for this monthly income, remember those samples, DNE and DNF, enhance that quite roughly to over 15, so about 13.3% chance of that event. So we can now replicate these percentages, viewed now as probabilities across all values of X bar within our sample space, and here we see our first example of a sampling distribution. So given that the true population mean was 6,000 pounds we see that only 1 of those 15 samples would have led to the discrete statistic, that X bar of exactly that amount. Of course, that occurs if we observed individuals A and D in our sample. If we observed any other of those remaining 14 samples, of course there, there would be a difference between our sample mean and the true parameter value. Indeed, just scanning through it, we can see that there's a 67% chance, i.e., two-thirds of the time, we end up with a sample mean which is within a 1,000 pounds of that true monthly income, i.e., getting a sample mean between 5,000 and 7,000 pounds. Of course, the flip side is, there is a 33% chance, or one-third probability, the R sample mean lies beyond 1,000 pounds from that true mean. So in this instance, we can see whether there is any sampling error, i.e., a difference between the sample mean and the true mean. But this, again, conceptually, goes to illustrate the importance of a sampling distribution. So in practice, if we don't know the value of the true parameter, we know that any estimate we derive from a simple random sample drawn from this population, we know that there is a risk that our point estimate may be wrong, and hence, these introduces uncertainty into our point estimation. Of course, later on as we showed you, we will need to quantify at this uncertainty, but nonetheless, we've now established a very important concept, that of a sampling distribution. So ideally, we would like a sampling distribution which is tightly concentrated around the true value of the parameter. So in the next section, we're going to consider more broadly different types of sampling distributions, but this serves as a very helpful starting point. [MUSIC]