You may have noticed that your software output specified a *Type __* Sum of Squares. This tutorial will explain what that means, and the differences between Type I, Type II, and Type III sum of squares using two-way factorial ANOVA as an example.

It can be shown that the \(F\) test for a one-way ANOVA is equivalent to comparing the full model to a reduced model. This is equivalent to asking if we are able to improve our prediction of the outcome by including the variable in the model, or if our guesses are just as noisy with the variable as without. The answer can be found by calculating:

\[ F = \frac{(E_R - E_F)/(df_R - df_F)}{E_F*/df_F*} \]

We will refer to the sum of squared errors from the full model - that is, the model with the variable we are testing - as \(E_F\). The reduced (or restricted) model is the model that is the same as the full model, except that the variable we are testing is removed. The sum of squared errors from the reduced model is \(E_R\). These values are also known as the “within” or “residual” sums of squared that are reported in the ANOVA output from any statistical package.

\(E_F*\) and \(df_F*\) refer to the complete model which includes all of the variables and interactions we want to consider. In the context of two-way factorial ANOVA, the complete model includes \(\alpha\), \(\beta\), and their interaction.

The degrees of freedom refer to the number of observations we have minus the number of parameters we need to estimate. By *parameter* here we are referring to means. If we have \(k\) different cells in our design, the full model has \(N - k\) degrees of freedom. \(df_F\) refers to this degrees of freedom value, and \(df_R\) refers to the degrees of freedom of the reduced model.

The wonderful thing about this approach is that it turns out we can *always* get the appropriate \(F\) statistic if we use the above formula.

How does this play out in the case of a two-way factorial ANOVA?

**Our model:**

\[ Y_i = \mu + \alpha_j + \beta_k + (\alpha\beta)_{jk} \]

We have a first factor, \(\alpha_j\), and a second factor, \(\beta_k\), and the interaction, \((\alpha \beta)_{jk}\).

We will use the `Moore`

dataset from the `carData`

package in R. (John Fox, Sanford Weisberg and Brad Price (2018). carData: Companion to Applied Regression Data Sets. R package version 3.0-2. https://CRAN.R-project.org/package=carData)

In this model, `conformity`

is our dependent variable, and `partner.status`

(\(\alpha\)) and `fcategory`

(\(\beta\)) are our independent variables. The following table presents the values of the within, or error, sum of squares we would get from fitting models with different combinations of terms. These are the error terms we will use in our formulas for the F-test using each type of sum of squares.

Model | \(SS_e\) | \(df_e\) |
---|---|---|

\(Y_i = \mu\) | 1209.2 | 44 |

\(Y_i = \mu + partner.status_j\) | 1004.868 | 43 |

\(Y_i = \mu + fcategory_k\) | 1205.467 | 42 |

\(Y_i = \mu + partner.status_j + fcategory_k\) | 993.253 | 41 |

\(Y_i = \mu + fcategory_k + (partner.status*fcategory)_{jk}\) | 1057.326 | 40 |

\(Y_i = \mu + partner.status_j + (partner.status*fcategory)_{jk}\) | 853.783 | 41 |

\(Y_i = \mu + partner.status_j + fcategory_k + (partner.status*fcategory)_{jk}\) | 817.764 | 39 |

### Type I Sum of Squares

Here is one approach:

- First, compare a model with
*only*Factor 1 to a model with only the grand mean. - Next, compare a model with Factor 1
*and*Factor 2 to a model with just Factor 1. - Finally, compare a model with both factors plus the interaction to a model with both factors and no interaction.

This approach yields the *Type-I (Sequential) sum of squares.* The order of the factors matters with this approach, and different orders will yield varying results. If we were to run the two-way factorial ANOVA using the Type-I sum of squares we would get the following table:

\(F\) Test | \(SS_w\) | \(df_w\) | \(MS_w\) | \(F\) | \(p\) |
---|---|---|---|---|---|

\(F_{partner.status}\) | 204.332 | 1 | 204.332 | 9.745 | .003 |

\(F_{fcategory}\) | 11.615 | 2 | 5.807 | .277 | .760 |

\(F_{partner.status*fcategory}\) | 175.489 | 2 | 87.744 | 4.185 | .023 |

We will use this formula to hand calculate our \(F\) statistics:

\[ F = \frac{(E_R - E_F)/(df_R - df_F)}{E_F*/df_F*} \]

- \(F_{partner.status} = \frac{(1209.2 - 1004.868) / (44-43)}{817.764/39} = 9.745\)
- \(F_{fcategory} = \frac{(1004.868 - 993.253) / (43-41)}{817.764/39} = 0.277\)
- \(F_{partner.status*fcategory} = \frac{(993.253 - 817.764) / (41-39)}{817.764/39} = 4.185\)

Note that in each case the denominator is based on the within sum of squares from the model containing all three terms.

### Type II Sum of Squares

Type-II sums of squares are calculated as follows:

- First, compare a model with
*only*factor 2 to a model with factor 1 and 2. - Next, compare a model with
*only*factor 1 to a model with factor 1 and 2. - Finally, compare the full model with both factors and interaction to the model with only factor 1 and factor 2.

ANOVA table:

\(F\) Test | \(SS_w\) | \(df_w\) | \(MS_w\) | \(F\) | \(p\) |
---|---|---|---|---|---|

\(F_{partner.status}\) | 212.214 | 1 | 212.214 | 10.121 | .003 |

\(F_{fcategory}\) | 11.615 | 2 | 5.807 | .277 | .760 |

\(F_{partner.status*fcategory}\) | 175.489 | 2 | 87.744 | 4.185 | .023 |

We will use this formula to hand calculate our \(F\) statistics:

\[ F = \frac{(E_R - E_F)/(df_R - df_F)}{E_F*/df_F*} \]

- \(F_{partner.status} = \frac{(1205.467 - 993.253) / (43-42)}{817.764/39} = 10.121\)
- \(F_{fcategory} = \frac{(1004.868 - 993.253) / (44-42)}{817.764/39} = 0.277\)
- \(F_{partner.status*fcategory} = \frac{(993.253 - 817.764) / (41-39)}{817.764/39} = 4.185\)

### Type III Sum of Squares

Recall our steps:

- First compare the full model with both factors and interaction to the model with only factor 2 and the interaction.
- Then, compare the full model with both factors and interaction to the model with only factor 1 and the interaction.
- Finally, compare the full model with both factors and interaction to the model with only factor 1 and factor 2.

ANOVA Table:

\(F\) Test | \(SS_w\) | \(df_w\) | \(MS_w\) | \(F\) | \(p\) |
---|---|---|---|---|---|

\(F_{partner.status}\) | 239.562 | 1 | 239.562 | 11.425 | .002 |

\(F_{fcategory}\) | 36.019 | 2 | 18.009 | .859 | .431 |

\(F_{partner.status*fcategory}\) | 175.489 | 2 | 87.744 | 4.185 | .023 |

We will use this formula for the \(F\) statistics:

\[ F = \frac{(E_R - E_F)/(df_R - df_F)}{E_F*/df_F*} \]

- \(F_{partner.status} = \frac{(1057.326 - 817.764) / (40-39)}{817.764/39} = 11.425\)
- \(F_{fcategory} = \frac{(853.783 - 817.764) / (41-39)}{817.764/39} = 0.859\)
- \(F_{partner.status*fcategory} = \frac{(993.253 - 817.764) / (41-39)}{817.764/39} = 4.185\)

### More about types of sums of squares:

If all of the cells in the full factorial model have the same \(n\), then the three approaches end up yielding the same results. However, most of the time, all of your cells will not have the same \(n\).

So which type should you use? Type-I sum of squares are appropriate if you are interested in the incremental effects of adding terms to a model. However, the conclusions will depend on the order in which the terms are entered. If there is really no interaction, Type-II and Type-III are the same for the main effects, and Type-II will have more power. But, the interaction effect was added for a reason, so in the end, you will use the Type-III sum of squares (SPSS defaults to this).