We continue our look at different sampling techniques. Now remember why we wish to take a sample in the first place. Well sometimes, it may not be practical nor economical to consider the entire population, and hence the census i.e. an enumeration of the entire population is not a feasible thing to do. For example, if we were considering an electorate in a country which may number many millions, clearly it's not feasible to ask every single person in an opinion poll, and we'll simply take a sample, maybe of about a thousand voters to determine their voting intentions. Of course even with that, there may be a lot of non-response bias, because people may not necessarily be willing to reveal to a pollster what their voting intention might be. They may also be response bias too in that in the privacy and anonymity of the polling booth the individual may be inclined to vote perhaps for a more extremist party and may not be willing to reveal that face to face to a pollster. But ignoring those technicalities aside, it just wouldn't be feasible to ask all of these many millions of people within that electorate. If we wanted to alternatively, perhaps test the reliability of our products such that the testing is done to destruction, then again it would not be feasible to test the entire population of the products. For example, imagine you are a tire manufacturer. Of course when you put tires on a car, they won't last forever, they will gradually degrade and they won't be able to be used after a certain number of miles or kilometers. So let's imagine the manufacturer made a claim about how long these tires would last. Well this has some echoes of hypothesis testing to come in, in week five. Clearly, it would not be practical for the tire manufacturer to test every single tire that it produces because of course they're testing these to destruction and hence they would never have any tires which they could sell because they'd all be used up in the testing stage. So inevitably, they would have to take a sample, potentially a random sample more on that momentarily to test on a car, how many thousands of miles or kilometers those tires lost. So very often, we are sort of forced to consider a sample rather than the wider population. Now we mentioned in the law section different types of non-random or so called non-probability sampling, convenience, judgmental, quota, and snowball sampling. Here, we're going to consider some different forms of probability sampling, sometimes referred to as random sampling. Now before we do, I'd like us to do a little experiment by play a little game. I'd like you all to choose at random a number between 1 and 10. Now don't tell me. Well of course, this is a video, so even if you did I couldn't hear you. But let's assume by this stage you have indeed chosen a number at random between 1 and 10. Now, I'm going to make a prediction here that you chose the number seven. Now clearly again, it's a video so you can't respond to me and I can't hear your response. Now, I'm guessing in fact many of you would have chosen if not seven, one of those other high numbers maybe eight maybe even nine. Why? Because in sort of human experiments, seven is a number which seems to resonate a lot with many people. Now, I don't know these are psychological or perhaps cultural reasons behind it but seven is a very popular choice. Now of course, if you chose one of those low numbers like one or two, you may wonder what on earth I'm talking about. But of course, there again is an example of sampling. You're listening to this video would represent a single individual an observation of size one. Well given this is a massive open online course, hopefully there are many hundreds, thousands, who knows millions of people tuning in, and hence we have a very large sample size in this case. Now if it was actually possible to collect all of your individual choices, and if we did a simple histogram of those, remembering the data visualization earlier on in the course, then I reckon we would tend to find that the modal answer are even most frequently occurring value would be seven or some high number close to it, because there's generally a bias of individuals to choose higher numbers rather than lower numbers. So what's the purpose of this? Well remember, I asked you to choose your number at random. Now, you may have felt that you chose your number at random, but of course it was your brain making that choice. And we as human beings, no matter how objective we try and be, and clearly what I was asking you to do was nothing controversial at all, we as human beings are inevitably subjective biased individuals. So clearly, if you are biased towards seven or one of those higher numbers when choosing at random a number between 1 and 10, well you weren't really choosing that number at random. Your brain was in control of that choice and hence that was an example of non-random sampling. But if we're biased in those situations, of course we are likely to be very biased in much more controversial situations. So, an example in fact of non-probability sampling because all of those 10 values were not known probabilistically how likely you were to choose each one of those. And indeed, there is our main distinction between random or a probability sampling and non-random or non-probability sampling. So in the probabilistic case, there is a known probability of selection of each individual within our population. Indeed, in order to conduct any kind of probability or random sampling, those terms being used interchangeably or one must have a list of the population, what we refer to as a sampling frame. And if we have the sampling frame, we can then attach known probabilities of selection to each individual member of that population. I say member, this is not necessarily referring to a population of people, we may be referring to some objects perhaps instead. Now, it's all very well saying we need a list, a so-called sampling frame in practice that may be a difficult thing to achieve. If we were looking at an electorate, presumably the electoral body in the country would have a list of all eligible voters, but this may not necessarily be accurate, it may not necessarily be complete. Not everybody is necessarily on the electoral register who could be eligible to vote. Of course, people do have a habit of dying occasionally, so people listed on that electoral register. Some of them may have passed away since that register was formed. There may be some duplications, maybe some people have migrated to another country that hadn't informed the electoral authorities. So, a sampling frame may not necessarily perfectly capture our target population. Another example might be perhaps the supermarkets trying to do a survey of its customers. Well, if it has a loyalty card scheme, it would have a clear database of loyalty card holders, but of course there would be some coverage bias here because it's unlikely that every shopper within that supermarket would have signed up to that loyalty card. So it's not a perfect world and hence it's unlikely we'll come across a perfect sampling frames, but we'll do the best that we can based on the real world constraints which we face. Of course, if we were looking at the population of students within the university, clearly the registry office of that university must have a complete database of all students who are registered on a particular course. So to do any type of random or probability sampling, we will require this list of population members. So just very briefly to run through the different types of random sampling techniques, the simplest one is simple random sampling, hence the name, whereby we attach an equal probability of selection to each person within that list of population members. For example, if we had let's say a population of a thousand individuals, then each of them would have the same probability of one over a thousand or being selected at the first stage of selection. If we decided to continue sampling with replacement, whereby whether an individual is selected, he or she is then returned to that sampling frame and could be observed on the second occasion or another stage of selection, then the probabilities of selection would remain unchanged. Of course, there is the danger that we may not necessarily wish to observe the same person more than once because they're just going to give us a duplicated set of responses. So instead, we may refer to sample without replacement whereby once someone is selected, they are not returned from the pool of people from which the the sample is subsequently drawn. So out of those thousand individuals, when the first person is selected, they are then permanently excluded from any remaining stages of selection leaving 999 people left. And hence, each of those would have an equal probability of selection at the second stage, namely one over 999. So a simple random sampling, one hopes it will remove any selection bias because we may defer the selection to a random number generator, we pseudo-random number generator as in built into many computer packages. For example, a Microsoft Excel will have a pseudo-random number generator attached to it, and hence we get this random sample of size a little n. Remember the little n, our common notation to refer to a sample size. However, there is a danger that even a simple random sample is not necessarily representative of the whole by chance, fate, purely by that random number generation by our random number generator. It may not necessarily lead to a representative sample because you might for example, just have all males or all females in your sample purely by chance, absence of a selection bias. But inevitably, if you've only got one gender, let's say in your sample, the views of that gender may not necessarily encompass those of the other gender. So, even when we do have a form of random selection, it's still not necessarily a guarantee of representativeness. Hence, we will require a few other kinds of random sampling techniques, which we'll consider in the next section, which will hopefully allow us to achieve a more representative sample.