In this video, we're going to talk about Assigning Participants to Conditions. And one of the most basic questions you can ask about this is, should every participant use every alternative? To illustrate this, we're going to walk through an example and we're going to see the trade-offs that are intrinsic to different approaches you can take here. In this study, we're going to compare a smart vacuum cleaner, with a more traditional vacuum cleaner. As you remember from last time, one of the first questions we should ask about this are, what are the measures? Do we want something that is, faster or, cleaner or, less fatigue, better ergonomics, lower environmental impact, portability, ease of use, fewer errors. Any of these are valid measures, you might even keep track of them all. For our example here though, we're, we're going to pick two. We're going to go with faster, and we're going to go with cleaner. So, our manipulation is going to be vacuum type and our measures are going to be speed and cleanliness. To help illustrate this example, I'm going to recruit the help of my graduate students and TAs and so here they are. One thing that you can imagine is that you've assigned half of them to each condition. That's called a, between subjects design and so we could assign half of them to the robot interface. And we can assign half of them to the traditional interface. So, Chinmi, Nicolas and Alex get the robot and Dan, Ranjitha and Jesse get the traditional one. And let's say for example, that the robotic interface performs better, faster, cleaner. With only six people, it's hard to say whether it's because the vacuum cleaner there is better or whether the three people we happened to assign to it Chinmi, Nicholas and Alex were better cleaners than Dan, Jesse, and Ranjitha. So, the other strategy is that within subjects design where we could have everybody use both interfaces. Probably going to use first. If they use the old interface first, they've gotten practice and they may be extra good or maybe tired when t hey get to use the new one and vice-versa .So, between subjects designs, have a worry about individual differences, within subjects designs, have a worry about ordering effects. Is there some way we can get the best of both worlds? Well, how might we address these ordering effects? Well, one thing we can do is we can take half the people and have them try one interface first. And the other part and try the other interface first. Counter-balancing has a couple of nice properties. For starters, you can treat it as completely a between subjects design if you look at only the first task that people do. And hopefully, the benefit that you get from practicing or the detriment of fatique is roughly the same in both conditions and it'll even itself out. And so, counter balancing washes out some of those concerns. There are times where this isn't the case and that's a great time to use between subjects design. To minimize learning effects, it's probably best to make the first and second tasks different. So in this case, we might have people clean the D school as their first task and clean the Gates building as their second. This also has the added benefit of giving us some more Ecological validity by trying out a couple of different environments so we know it's not just the Gates building that's better cleaned by robots. It also has the benefit that both the D school and the Gates building are clean. How about individual differences? Should we try to balance for shirt color, hair length, gender, or whether the picture in their icon is squarish or more rectangular? It depends on whether you think it's gonna make a difference in the study or whether one might plausibly believe that it would make a difference in the study. Simplest thing to do, is truly random assignment, we'll talk about more sophisticated techniques in a couple of slides. Okay, so now we know a couple of ways of dealing with two different alternatives, what if we have three? With three alternatives, you can use something called a Latin Square. I'l l explain how this works in three conditions, it generalizes more. In a classic Latin Square design, each person is going to use all three conditions. You'll randomly divide your participants up into three different groups. The first group is going to get the first one first, the second one second and the third one third. The second group 231, and the third one 312. So again, everybody gets all three conditions, but their order changes. And if you look at any particular ordering segment, so like what people see first what people see second what people see third, you can see that, that is also evenly balanced across the three conditions. Whether you choose between subjects approach or a within subjects approach. The most important thing to do is to make sure that the odds that any particular participant ends up in a particular condition or particular condition ordering is completely random is, is even. We can illustrate this with an example. Say you wanted to find out whether people were faster typing in the morning or typing in the afternoon and you allow people to come in whenever they want. What if people who have a preference for the morning, morning people are faster typers than people who have a preference for the afternoon? Your conclusion is that morning would be, that morning was faster. But that's not right, because just the morning people were faster, not that there's something about the, the morning, or maybe not, you can't say confound. It's possible that the causal reason was population difference and not experimental manipulation. This confound is why a lot of economics is so hard. You're computing correlations, but there's no manipulation. Random assignment is tool number one in achieving a, an effective manipulation. So in our typing case, you would want to assign people to be in either the morning condition or in the afternoon condition. The morning and afternoon example can seem kind of stylized, but it shows up a lot in the real world. For example, if you're running a website, one easy thing y ou might do is show everybody one alternative on one day and one alternative on the next. Well, there may be a difference between those days. We'll talk more about running experiments later on. But for now I, wanted to point out that the key is if you're going to do something like this make sure that people are randomly assigned. Easiest way in most cases is a, is a between subjects design where you assign people as they come in to see either one interface or the other. Here's another example of the importance of random assignment. In the 1930's, some studies were run at the Western Electric Factory outside Chicago, called Hawthorne. And the plan was pretty simple, find out whether changes in lighting levels affected productivity. So, experimenters came in, raised the lighting levels, productivity goes up. Then, they tried lowering the light levels, productivity went up. Tried a whole bunch of combinations, after each intervention, productivity went up. The conclusion of course is that it's the active intervening rather than that the light levels itself, which was the major cause behind the productivity change. Presumably either the workers felt like people cared about them, or the excitement of the experiment, or whatever. That was the driving factor. In recent years, some economists have questioned whether in fact there was a Hawthorne effect at the Hawthorne plant. If you're curious, you can Google more about that. In either case, the name stuck and it means a case where what you're seeing is your effect as a result of the intervention rather than the thing you were trying to study, and you can avoid this with random assignment. We've talked about counterbalancing the order of conditions that participants experience. You can also counterbalance how you assign people the conditions. Say for example, you are worried that typing speed will differentially affect something in your, in your interface. You're building a new spreadsheet or something like that. You could use a pretest to establish typing s peed ahead of time and use that to assign people to conditions. There's many techniques for doing this. The simplest way to do it is just look at high, high-speed typers versus low-speed typers. The key, no matter what, is that each participant has an equal chance of ending up in either condition. Let's walk through an example. If you can pretest everyone ahead of time, one slick thing that you can do is form matched pairs. So, say we get the typing speeds that we see here and after ordering they look like this. We can group them into pairs, and then for each pair we can conceptually flip a coin about which of them is going to land in which of the two conditions. I got one of these dollar coins in the ticket machine the other day, it will do well for flipping, so for a 35 and 37, it's heads. So, we'll put 35 down here, in the first condition, 37 goes in the second. And for 57 versus 59, it's tails, so 59 goes here. Third one's heads, so that gives us 61, 68. And tails, and that gives us 99 goes here and 70 goes here. By doing this matched pairs, you're balancing out the performance of people approximately in each condition and by having some randomness in there, you're ensuring that you don't get some accidental statistical artifacts that creep in by saying assign all your odds here and all your evens there. But say people are coming in online so you can't pretest people for the experiment. Well what you can do, is you can pick some threshold that you think is in about the middle. So, for typing you might say 65 words a minute. And as people come in, you can check whether they're above or below that threshold and label them as high, high or low, fast or slow typers based on that. So, 35 we would say is low, 40 is low, 90 is high, 68 is high and so on. And, you can assign them to your two different conditions, call them A and B by high and low. So, our first low person to come in 35, tails, so they go to B. And then to balance that out 40 we would go heads. Next time we got a pair of l ows, we can flip the coin again and we're going to do that for the highs also. So, the highs come in and heads. So, our first high will go to A and our second high will go to B. You don't need to make sure that you have even numbers of high and low typers, unless you're worried about that, making a material difference on the, on the outcome. In fact, if you have enough participants, you can look at this two by two grid and compare the outcomes on the four cells. What you do need to make sure, is that there are the same number of fast typers of high in A and in B and the same numbers of slow typers are low in A and in B. There's lots of ways that you can do this kind of counterbalancing. In general, all of them are treating the same thing which is try and help the law of large numbers that you get in a between subjects study, work a little bit faster. Now, there's a danger of assigning people based on a pretest that I would like to warn you about. Say, we wanted to pretest for coins that were more likely to come up heads. So, I have some coins here. I can flip each of these a couple of times, heads, tails. And tails. So we have three coins that had heady tendancy and three coins that had taily tendancy. To make it a little more exciting than the six coins I had in my pocket, I decided to generate 30 coins in excel, and I flipped them each 21 times and you can see the results here of how many heads each got out of 21 tosses. I picked an odd number so they would have to be either heady or taily on the whole. What we can see if we rank them by number is, it turns out actually 15 of them had a head tendancy, 15 had a tail, and the average of the heads is 12.9 and the average of our tail tendancies is 8.3. I can flip that again, and I've highlighted in yellow and bolded again the numbers that have a head tendancy. There's essentially no correlation between here. It does turn out that our heads have, our initial heads, have an average of 10.7 and our initial tails have an average of 9.6. So we can see that both of these regress towards the mean. Both of these are closer towards the expected value of 10.5. We may see a perceived difference here between the 9.6 and the 10.7. And in a future lecture we'll learn how to test whether that's a real difference or whether that's a statistical mirage. So this danger regression is very real and it shows up all the time in statistical analysis of things like low performing schools increasing the next year or high performing schools decreasing. The assignment problem that results in regression to the mean, is when you're making something like a dividing line and you're using the assignment so everybody who scores high goes up and everybody who scores low goes down and then you measure their performance subsequently. If instead you use the pre-test to counterbalance, like we did with the typists, so we put high and low-speed typists in both conditions, then you no longer have that worry about regression to the mean. So, the major question that we tackled here is, should every participant use every alternative? And with it emerged was three major strategies. In a Within-subjects design, everybody tries all the options. This has big benefits in terms of recruiting participants. You get more work out of each person. And it works really well when you're not worried about learning or practice or exposure issues. Trying one version will pollute the other one. In Between-subjects study, each person tries one condition. This requires more people and you may want to consider counterbalancing for fair assignment. It has the benefit of course, that each participant is uncorrupted and for this reason is the most common technique that we see in things like web studies. And if you use it between subjects design, you can use counterbalancing to help even things out. What I offered today, is just a really high level overview. And I've necessarily glossed over a whole bunch of important details. But I wanted to give you an initial lay of the land for running studies. If you're interested in more reading in this area, my favorite book in this area is David Martin's book Doing Psychology Experiments. I'll see you next time.