Welcome back. In this module, we're going to talk about experimental design, and we're going to revisit some of the concepts that we did in a previous course, but now with a little bit more technical emphasis. So the goal of experimental design is twofold, to induce the human subject to do the things that you need them to do in the scan. And secondly, to effectively detect brain signals that are related to those psychological states. You get to control what to present and when. And there's two kind of considerations then, psychological and statistical. And the point of this, we're going to talk now about the statistical considerations because it's possible to have a beautiful study at a psychological level that induces an amazing effect, very strong in the brain. But you're completely unable to detect that effect and get any results because of the way you designed the study and its match with the fMRI environment. So I'm going to talk about eight principles of fMRI design over the next couple of lectures. First is considerations for the sample size, the scan time, number of conditions that you test, how the events are grouped in those conditions, the temporal frequencies at which the task varies, the randomization of events, the issues surrounding nonlinearity in BOLD signal, and ways of optimizing your design. So let's review again the structural model for the GLM, which is Y = X beta + error. So we have some outcome data here. This is not a GLM example, but it works just fine for us, which is really a series of observations, and the design matrix, which is usually an intercept. And then predictors, they can be continuous or categorical. Those are multiplied by model parameters, one slope for each of those regressors in the design. And these are the things that we estimate when we fit the model, plus the residuals, everything left over. So now we're going to look at the algebraic foundations of design efficiency. What makes a more powerful design? And this depends on the magnitude of the effect, which is contrast times beta hat divided by its standard error, which is a measure of variability due to noise. So for example, the t statistic, our bread and butter statistic in brain imaging, is beta hat divided by its standard error. Let's take a closer look at that standard error. And that equals sigma hat, which is residual noise, times square root of X transpose X inverse, and it would be the diagonals of that matrix, one for each parameter estimate. So sigma is the scanner noise, residual noise, that you can't account for in any other way. Could be about head movement, could be something else. And this other part here is the design efficiency. And this is related to the design matrix itself, which you know before you ever collect any data. So an inefficient design increases your standard errors, decreases your power. So you don't need any data to estimate this. It actually doesn't depend on the data, as long as the form of the model is correct. So we're going to assume for this part of the lecture that we have the correct model. And then given that, we can optimize the efficiency of the design. So, to minimize the standard error, we can reduce the noise in the scanner in a variety of ways, or we can make the design more efficient. So, what's useful here is that we can define efficiency as a metric then, as one over the design related component of the standard error. So understanding and maximizing efficiency is one of the foundations of experimental design. Maximizing the average efficiency for a series of effects that you care about beta hats, or contrast, is called A-optimality, and that's directly related to statistical power. This isn't the only possible criteria, there are other criteria like D-optimality and others as well. But we'll focus on A-optimality here, because it's really quite useful for enhancing efficiency and power in a first-order kind of way. So let's look at an example event related design with four event types. Now we've got four intermixed events with jitter in between, and we've got four regressors related to each, and now I've got four beta hats that I've estimated. So let's look at that X transpose X inverse matrix that's so critical for the standard error. The diagonals of that matrix are related to the error variances for each of those beta hats. The off-diagonals are related to the error covariance, or the way that the parameters trade off. So higher values here means less efficient design, lower power. Higher values in the off-diagonals means that the parameter estimates are correlated. I'm not sure which ones are actually driving the results or should be. And then, finally, this is the intercept, just to note that. So it also has its own variance in which determines the fundamental stability with which I can estimate the baseline levels. So what makes these diagonals' values smaller and designs more efficient? Well, there are three factors we'll cover here, one is a large rise and fall in the predictors themselves, that's predictor variance. Secondly, a low covariance among predictors, or orthogonal predictors. And this is a minimal multicolinearity problem, is one way of saying that. And finally, large sample sizes, so the variance is proportional to the square root of the number of observations. And all of that is factored into that matrix. So in fMRI setting, it's a little bit more complex but the principle is exactly the same. We have to factor in contrast, high-pass filtering, and autocorrelated noise in the scanner. So now we'll take our four column design matrix, and we'll multiply it by contrast weights, 1, 1, -1, -1, or multiply the parameter estimates by those. And that's equivalent to doing a main effect of factor 1 in this 2 by 2 factorial design. So that's from the previous class. So now let's look at the variance of that contrast estimate. So all we have to do is take that X transpose as inverse and sandwich it between contrast transpose times contrast. Now the t statistic becomes contrast times beta hat divided by its standard error. So in this way, we can factor in any contrast or set of contrasts that we care about. And C can also be a matrix with one column per contrast. And in this case, all the contrasts will be evaluated independent of one another. That's how the linear algebra works out. So now let's factor in high-pass filtering, which is a good thing to apply. So we can define a high-pass filtering matrix K, which is a linear operator. And we can apply K to the design matrix X, and we get a filtered design matrix, which we'll call Z. Now, I can just substitute in Z, and now I've got an expression for the variance of a contrast or steady contrast, considering high-pass filtering. And finally, we'll factor in autocorrelation. And again, the formula's more complex. What I'm essentially doing here is I'm defining an autocorrelation matrix V, which has some known colored noise or estimated colored noise in the scanner. And now I'm just taking that entire expression, and I'm sandwiching that around the V matrix, which gives me an estimate of the variance with color noise. And one common trick here is to use the pseudo-inverse instead of a standard inverse. So we'll denote that with Z+, and that's the pseudo-inverse of Z. And that's the same as Z transpose Z inverse when the design does not rank deficient. And many programs actually use this instead of a standard inverse, just to note. So now I've got an expression for the efficiency of the variance of a contrast or set of contrasts that accounts for all of these factors. And this is proportional to statistical power. It's not the same as power, but it can be converted to power given a reference effect size of interest. So now let's look again at one more level of depth, and then we'll look at efficiency in a multi level setting. So in a group analysis, efficiency and power don't just depend on the observations within a person, they also depend on the variation between people and the observations between people and the sample size. So it depends on both within and between-person variance. So, what we're looking at is expression for the variance of a contrast of interest. And this breaks down into the following, this first term here, sigma squared within, that's within-person, times c transpose X transpose inverse c is within-person standard error. That's whenever efficiency within that person. And we have to add another piece to that. And this is related to the variance of that contrast between people. That's the real individual differences that we have to average over. Let me do the test in a group setting and then there's the sample size. And this all boils down to essentially is proportional to the square root of the sum of the within-person and between-person variances, divided by the sample size. So increasing within-person efficiency will help by collecting more data, for example, up to a point. Increasing the sample size always helps, because we're always going to divide by the square root of the sample size, N, in the end of the day. And the greater the between-person variance, versus the within, the more sample size is actually really important. And at a certain point, I can keep collecting data within a person, right? And the contribution of the within-person error will drop essentially to 0 if I collect enough data within a person. So in that case, power is really going to be limited by the number of subjects that I collect. So that's often a bigger constraint on the power. So even if the efficiency at the first level is infinite, power at the group level is limited by this square root of the between subject's variance divided by the sample size. That's the end of this module, in the next module, we're going to continue to look at design efficiency and optimization.