10One or More Categorical Predictors – Analysis of Variance

10.1 Comparing Groups

Analysis of variance (ANOVA) follows logically from the t‐test (Chapter 5), where the difference in the mean between two groups is compared to the variation or the spread within the two groups. This type of analysis caters for cases where your categorical explanatory variable (sometimes called ‘factor’) has more than just two levels or groups. For example, while a t‐test could be used to test the difference in sales between two packaging colours for a product, ANOVA can be used if you have more than just two colours to compare. Analogous to the t‐test, we focus on the comparison between two metrics, one characterising the differences between the groups, and the other one within the groups. Intuitively, this makes sense, the mere differences between groups do not allow us to make a judgement, unless we know the contextual or background variation. If you look at the group means in Figure 10.1a, it is not possible to state whether group 1 differs from group 2 for example. We can see that the mean for group 2 is higher than the one for group 1, but this is meaningless without the knowledge of how the individual data points spread around the means. On the one hand, they could vary around the means as shown in Figure 10.1b, in which case the differences appear coincidental. We could also express this as ‘the variance within the groups is high relative to the variance between the groups’. On the other ...

Get R-ticulate now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.