Welcome. In this lecture you will learn how to specify and select a good econometric model. We start with the familiar model where the dependent variable y is explained by a set of variables collected in X. Here, y can be a stock index return and X a number of variables that may explain movements in the stock index. We write this relationship either explicitly for each individual observation as done on the top of the page here, or in matrix form, as was discussed in lecture two. The question we'll address now is which variables to include in the matrix X. It turns out that there's a tough trade off that we face. If one considers a model with a small number of variables, there is the risk that relevant variables are missed, and thus actually too few variables are included. This will lead to an estimation bias. If one, however, considers a model with too many variables, there's an efficiency loss. Now let us develop the results on a previous slide. We compare two models. Suppose that the data-generating process, DGP in short, contains two group of explanatory variables, X1 and X2. We contrast this with the actual estimated model which only contains X1. In this model we denote the estimator of beta1 by b(r), where r stands for restricted as we've restricted beta2 to zero. Also a tilde is added to the disturbance term, to indicate that it is different from the one in the DGP. The estimators of beta1 and beta2 in the DGP are then ordered by b(1) and b(2). Now I invite you to answer the following test question to relate the expected value of b(r) to the values beta1 and beta2. The answer uses the famous OLS formula for the regression of y on X1 alone. Then we plug in the expression for y. Here it is key to use the equation that generated the data, so with both X1 and X2. Then we split this into three parts. The first part becomes beta1, the second part a certain matrix times beta2 and the last part is zero. This gives us the first result. The restricted estimator will be biased unless beta2 is zero, or X1 and X2 are completely orthogonal, such that the product and thus P is zero. We refer to this bias as the omitted variable bias. Now we turn to the efficiency part. Efficiency concerns the variance of our estimators. We prefer estimators that have no or small bias with low variance. An estimator with the lowest possible variance is called efficient. The variance of the restricted estimator, b(r), is equal to the variance of the unrestricted estimator, b(1) minus a positive semi-definite term, such that the variance of b(1) is always larger than that of b(r). This result will be considered further in this lecture's training exercise. While the benefit of adding variables is bias reduction, a cost is thus increased variance. The next step is to translate this finding into some measures, or Metrics, that we can use to find a good trade off between bias and efficiency. We turn to two commonly used decision metrics, information criteria and out-of-sample prediction. Often there's a preference for small models in the sense that a limited number of variables are included. When adding variables, at a certain stage the added benefit of yet another variable will be relatively small, and it is good to stop adding variables to the model. Information criteria capture this idea. They study the goodness of fit of a model, here captured with the standard error of the regression as defined in lecture two, but impose a penalty on the number of parameters k. Two commonly used information criteria are the Akaike information criterion, abbreviated with AIC, and the Bayesian information criterion, abbreviated with BIC. For both the AIC and BIC the value is equal to the log of the squared standard error of the regression plus a term that is a function of k, the number of variables in the model. The two information criteria differ in the penalty they impose on the number of parameters. When comparing models, a lower value of the information criteria is preferred as we aim for a low standard error of the regression. Now I invite you to go examine which information criterion imposes the strongest penalty on the number of variables. The penalty on the number of parameters k is 2/n for the AIC and this is log(n) over n for BIC. Thus which criterion imposes the strongest penalty depends on the number of observations n. When log(n) is larger than 2, the BIC imposes a stronger penalty. Thus when n is larger than the square of e, the BIC imposes a greater penalty from which we conclude that for eight or more observations, BIC imposes a stronger penalty than AIC. The information criteria are based on so-called in-sample results: using all observations in a sample. Often we're also interested in the predictive performance of our model. This can be in a time series sense, that we want to forecast a stock price to earn some money, but also if you have data on household consumption and want to predict whether they will buy a certain product or not. In such cases, the full sample can be split in an in-sample part, often referred to as the training sample, and an out-of-sample part. The observations in the second out of sample part are kept out of the main analysis, for example when estimating beta, and they're only used to examine the predictive ability of the model. Two commonly used out of sample criteria are the root mean squared error, RMSE, and the mean absolute error, MAE. Both criteria consider the difference between the actual observation y(i) and the predicted value, y-hat(i), but they differ slightly in how the prediction errors are averaged. In both cases, a lower value means a better model. Now let us return to the problem that a researcher faces, how to decide which variables to include in X. If you consider removing a group of regressors, you can use an F-test for a joint significance of the second group of coefficients, or simply a t-test if you wish to remove only a single variable. However, be aware that these tests are only concerned with the significance and do not incorporate the bias efficiency trade off. If you already have a set of candidate models that differ in the number of parameters, information criteria can be of use. These take into account that small models are preferred if more complex models do not perform sufficiently better. Here you can also consider using out-of-sample prediction. If the goal is prediction and there are a number of candidate models, you may as well pick the one that has the most predictive power, and provides the lowest root mean squared error or mean absolute error. Very often we're, however, not fortunate enough to start with two groups of regressors, X1 and X2, or with a candidate set of models, and we need to get just one model first. In this case, iterative selection methods can be of great help. These come in two variants: general to specific and specific to general. In the first you start with the most general model, including as many variables as are at hand. Then check whether one or more variables can be removed from the model. This can be based on individual t-tests, or a joint F-test in case of multiple variables. In case you remove one variable at a time, the variable with the lowest absolute t-value is removed from the model. The model is estimated again without that variable, and the procedure is repeated. The procedure continues until all remaining variables are significant. The specific to general approach follows the same logic, but starts with a very small model, sometimes even only consisting of the constant term. Variables get added one at a time, choosing the one that has the largest absolute t-statistic. This procedure is repeated until no significant variables can be added anymore. Both procedures have pros and cons. The specific-to-general approach starts small, which is appealing. However, many variations need to be tried at the initial steps. Also, it can easily happen that important variables are missing in initial phases so that initial tests are performed in mis-specified models. Now I invite you to make the training exercise, to train yourself with the topics of this lecture. You can find this exercise on the web site. And this concludes our lecture on specification.