So a question that we've talked about before in experimental design is, how much testing do we need to do? How much replication of a design is necessary? I'd like to talk about that in the context of this chemical process experiment, these two square that we just spent some time looking at. Suppose we want to test or detect effects of size two Sigma, where Sigma is the random variability in the data. If the basic two squared design is replicated twice, and we have a total of eight runs, then we have four degrees of freedom for error. That by the way is a model-independent estimate of error, pure error. If we use Alpha of 0.5, we get a power of about 0.57, 57 percent. That's really too low, you really want the power to be higher. Well, you could do more replication. On the other hand, you could use a higher type 1 error rate. As you increase the type I error rate, the power will go up. Is there a danger in that? Well yeah, there's a danger of making more type I errors of course. But in screening experiments, where we're trying to figure out which factors are important, and we have a relatively large number of them, type I errors really don't have the same impact that type II errors do. Because if you inadvertently think that the factor is important, it really isn't, you'll discover that subsequently. But if you think something is not important and it really is, you may not discover that ever, and that could have really negative impacts on your product or your system or your process or whatever you're trying to do. Failing to identify an active factor is a big problem. So sometimes in screening work, experimenters will think about higher type I error rates, not 0.05, but maybe 0.1, or maybe even 0.2. If you used Alpha of 0.1 in our chemical process experiment, the power is 0.75, which is really pretty good. If you use Alpha 0.2, the power rises to 0.89, 89 percent, very good. But still you might want to keep the significance level at five percent and add replicates to see if you could get better power. Suppose for example that you use three replicates. With three replicates, you now have eight degrees of freedom for pure error, and if you want to detect effects of size 2 Sigma with Alpha of 0.5, that gives you a design with a power of almost 86 percent. This is a very good value. So the decision to use three replicates here was a very good decision, and I always think that for a two square design, three replicates is a good target. Software can do these calculations for you. This is the output that you get from jump if you input an adjusted mean square error of one power, a significance level of 0.5. The fact that these coefficients are all one, that says that the effect size is two standard deviations. We're going to talk now about the next case of a two to the k factorial, the two to the three. Now the two to the three is an eight run design, and geometrically those eight runs can be displayed at the corners of a cube as you see here. Look at the labels, make sure that the labels makes sense to you. For example little ab, that has a at the high level, b at the high level, c at the low level. Little bc has b at the high level, c at the high level, a at the low level. So make sure that the labeling system makes sense to you. Then on the right-hand side, the b part of this figure shows you the design matrix, all eight runs, and the eight runs here are displayed in standard order. Look at the standard order. The column for A is alternating minus and pluses, column for B is alternating pairs of minuses and pluses, and then column C is four minuses and four pluses. So anytime we refer to standard order, this is the way that you would see the design matrix displayed. Now of course you would not run the experiment in this order, you would randomize the order of the runs. If you run it this way, you'd have the first eight runs you'll see at the low level, and the last eight runs you'll see at the high level are really bad idea, and in fact there would be a structure in all of the runs. So randomizing the order of the runs is always a good thing. But knowing what the standard order is and how to generate that standard order is sometimes very useful. How about estimating the effects? Well, here's how it's done. To estimate the main effect of any factor, you get the average of the runs where that factor is at the high level and you subtract the average of those runs where that factors at the low level. As you can see from the top row of this display, those are the runs in opposite phases of the cube. The two-factor interactions are a little bit more complicated, but it's the same idea. It's the difference in diagonal averages. So it's like it was in the two square. But now these diagonals are not lines going through a square, they're planes going through the cube. Here's a display that shows you the three sets of planes that you would compute the averages along in order to calculate the AB, AC, and BC interaction. The three-factor interaction is tricky. If you look at this carefully, I think you could see that the plus runs that you would use are the solid dots, the open circles are the negative runs, and each of those groups of four runs forms a figure that is called a regular tetrahedron. It's a regularly-cited geometric figure in three-dimensions with four vertices and each plane is an equilateral triangle. Well, if you form the superposition if you will, of those two tetrahedra, you get the two groups of runs that form the positive and negative portions of the contrast that you use to calculate the ABC interaction. Here's an example of a two to the three factorial. This is example 6.1 from the textbook. It's a two to the three factorial run in a process used in semiconductor manufacturing called plasma etching. We have a tool that etches silicon wafers. The tool operates by putting the wafers into a chamber, pumping some air out of that chamber so that you have a vacuum, and then introducing a mixture of gases. Well, the gas used here is something called C2F6, and what we're controlling is the flow rate of that. The wafer sits between an anode and a cathode. The gap is another factor that we look at. How far apart are the anode to cathode? Then the third factor is the amount of RF power applied. When the power is applied to the anode, this excites the electrons in the gas and they attack the surface of the wafer and etch remove the material that's been deposited there previously. This is an interesting table. Take a look at this. This is the table of Madison plus signs for the two to the three factorial. Now if you were to look carefully at this, you would notice that column A, column B, and column C are just the design matrix I showed you earlier in standard order. So if you want to calculate the effect of A, all you really need to do is to take the signs in column A and attach them to the labels that you see over here and add them up, and that will give you the contrast for A. So the contrast for A is simply minus 1 plus a minus b plus ab, minus c plus ac minus bc plus abc. So you plug in numbers instead of the labels for those treatment combinations and add them up and you divide by the half the number of runs, and there is the estimate of A. You do the same thing for B and the same thing for C. So this table actually contains all of the contrasts. The contrast for AB is found by simply multiplying column A times column B. Notice that minus times minus is plus, plus times minus is minus, minus times plus is minus, plus times plus is plus and so on. The AC contrast is column A times column C. The BC contrast is column B times column C, and the ABC contrast could be found by multiplying A times B times C. You can also find the ABC column by multiplying A times BC. So this table can be used manually if you want to generate the contrasts. Of course, once you have the contrast, the contrast divided by half the number of runs is the effect estimate, and the contrast squared divided by the total number of runs is the sum of squares. So this really facilitates doing the analysis of variants by hand. Properties of the table. Except for that column I, the first column, which is called an identity column, every column has the same number of plus and minus signs. The sum of the product of signs in any two columns is always zero. Multiplying any column by the identity column leaves that column unchanged, it's why it's called an identity element. The product of any two columns in the table produces a column it's also in the table, A times B for example produces the AB column, AB times BC would produce AB square C. But B square is the same as an identity column. So AB square C is just AC. These properties tell us that the two to the k design is an orthogonal design. The dot-product of the columns being zero that's a key. Orthogonality is a very important property that is shared to some degree by all factorial designs. All factorial designs exhibit orthogonality in some way. Orthogonality makes interpretation easy because when you calculate the effect of a factor, you are getting the unique effect of that factor by itself on the response. Here are the factor effects from this experiment. If you just look at the magnitudes of the effects, a couple of things really jump out at you, don't they? Factor A, factor C, and the AC interaction appear to be large. In fact, if you compute the sums of squares for those terms, and then look at the percent contribution, notice that those three terms account for well over 90 percent of the variability in the data from this experiment. So it tends to make you think that those three factors are very important. But of course we really need to do this statistical testing, don't we? We really need to do this statistical testing to be sure that we know what's important.