Hello everyone. In this video,

you will learn how to model and

format seasonality in demand forecasting.

We speculate that both holidays and

wedding seasons may stimulate the demand in cookware.

For example, during the holiday season,

that is, November to December,

people may buy cookware for themselves or as gifts.

In wedding, people often buy

cookware as a gift for the new couples.

The wedding census of the US clearly

indicates that the peak wedding seasons

are in the summer months.

Seasonality such as the holiday season

and the wedding season can only

choose a limited number of options

or categories such as the 12 months.

Thus, we should model

the seasonality as categorical variables.

In contrast, the continuous variables such as

price and home sales can be any fraction.

For seasonality, we use

binary or dummy variables

to indicate the month of the year.

For example, we use the binary variable Feb,

to indicate the month of February,

where Feb equals to one if

the month is February, zero otherwise.

In the figure on your right,

you can see how we format

the Feb variable for different month-year

of the data.

For the February of any year,

the Feb variable equals to one.

For all other months,

the Feb variable equals to zero.

The same format applies to

the binary variables of all the other months.

Because we have 12 months,

now the question is should we create

a binary variable for each month of the year?

The answer is no because of the famous N-1 rule,

which says that we cannot create a binary variable for

each category and we must

leave some categories out as the default.

For example, let's consider the gender variable,

which equals one for men, zero for women.

In this case, women is the default.

We should not create two binary variables,

one for man and another for women,

which is completely unnecessary and

actually creates a multi-collinearity issue.

Looking at the data,

we select the month January and September as default

because they're neither the peak month

nor the valley month for the demand.

Together with the time and price variables,

now we have totally 12 independent variables.

Because Excel Data Analysis Add-In allows

at most 16 independent variables, so we're fine.

Now, let's build a multiple regression model based on

all these 12 independent variables

by first clicking on Data,

then select Data Analysis,

and then select Regression to bring up

the Regression dialog box.

Now, in this box, let's first select

the sold units to be the input y-range.

Then, let's select

all the independent variables including

their labels on the first row as the input x-range.

Then we check the box of

labels, specifying the output range,

and then in the residuals box, check residuals,

residual plots, and line fit plots and then click "OK".

Let's now look at the outputs.

In the Summary Output,

we found that the R-square has increased to 94.9 percent.

The coefficients table to your right

shows that the time, sale price,

and November variables are very significant because

the p-values are smaller than 0.05.

The October variable is also

significant at the 0.1 significance level.

All other variables are not significant.

Now, let me explain the meanings of the coefficients.

The coefficient of time means that

the sold units increase by

15.58 units on average for every month.

The coefficient of the sale price means that on average,

the monthly sold units increase by

4.89 units per dollar of price drop.

The coefficient of November means that,

November increases the sold units by an average of

324 units relative to the default months,

which are January and September.

Similarly, October increases the sold units

by 259 units relative to the default months.

Based on the coefficients table,

we can determine the regression equation.

Please see the equation in

a table format to your left,

where the sold units equals to the intercept which is

4,855.14 plus the multiplications

between the coefficients and

the corresponding independent variables.

The line fit plot with respect to

time is shown on your right,

where you can see that the model fits the data very well.

The fit of the model can also be verified by

the scatter plot between

the predicted sold units and the actual sold units.

Here are the residuals of the model.

Overall, the model seems valid

because the residuals meet the conditions of linearity,

independence and equal variance,

except that there may exist

a slight curve in the residuals.

In summary, our analysis shows that time,

price and November have

a significant impact on the demand,

but the wedding season may not.

Together, these variables can explain

nearly 95 percent of the variation in the data.

Our residual analysis shows that there might

exist a slight n-shape curve in the residuals,

but other than this,

the model is largely valid.

Now, the question is how to make a forecast

for 2013 and test its accuracy?