ANOVA/Linear Model Example

Author

Dave Bridges

Published

May 9, 2014

Explanation of How to Do An ANOVA or Linear Model

Here is some sample data we will use throughout. In this case there are three colors, 2 shapes and some sizes. This analysis presumes size is a continuous independent variable and shape/color is a discrete dependent variable

Show the code
set.seed(45)
balloon.data <- data.frame(color=as.factor(c(rep("green",6), rep("red",12), rep("blue",6))),
                          shape = as.factor(rep(c("square", "circle"),12)),
                          size = rpois(24, lambda=5)*abs(rnorm(24)))

One Way ANOVA

This is used when you have one variable, with either predictor. That predictor could have several levels. In this example you have three colors of balloons and are testing the hypothesis that size is affected by balloon color.

I prefer to use the broom package to make nice visualizations of the summary tables.

Show the code
library(broom) #for tidy
library(dplyr) #for chaining commands

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Show the code
library(knitr) #for kables

#one way anova (two types)
aov(size~color, data=balloon.data) %>% tidy %>% kable(caption="ANOVA testing effects of shape on size")
ANOVA testing effects of shape on size
term df sumsq meansq statistic p.value
color 2 13.24855 6.624277 0.4861405 0.6217469
Residuals 21 286.15152 13.626263 NA NA

Interpreting a Univariable ANOVA

This is what is konwn as an omnibus test, so it is not saying whether there are any specific between-group differences (not testing red vs blue), but instead is testing whether there is a general difference between groups (ballon color matters). In this case the p-value is 0.6217469 so we would not consider this a significant effect of color on size.

Two-Way ANOVA

This is used when you have two (or more) potential variables which affect your dependent variable. This is also known as a multivariable regression. Here are two ways to analyse this data, the first presumes independence between shape and color, the second is testing whether there is an interaction between shape and color (e.g. blue squares are different from what you would expect from something being blue + being a square). For interactions you can write this as color*shape or color + shape + color:shape.

Show the code
#two way anova, with interaction
aov(size~color+shape, data=balloon.data) %>% tidy %>% kable(caption="2x2 ANOVA testing for effects of color and shape on size")
2x2 ANOVA testing for effects of color and shape on size
term df sumsq meansq statistic p.value
color 2 13.24855 6.624277 0.5010345 0.6133087
shape 1 21.72753 21.727529 1.6433856 0.2145235
Residuals 20 264.42399 13.221199 NA NA
Show the code
#two way anova, without interaction
aov(size~color*shape, data=balloon.data) %>% tidy %>% kable(caption="2x2 ANOVA testing for effects of color and shape on size, including and interaction term")
2x2 ANOVA testing for effects of color and shape on size, including and interaction term
term df sumsq meansq statistic p.value
color 2 13.24855 6.624277 0.4784560 0.6273996
shape 1 21.72753 21.727529 1.5693283 0.2263358
color:shape 2 15.21194 7.605971 0.5493614 0.5866966
Residuals 18 249.21204 13.845114 NA NA

Interactions

Our typical practice is to first test for an interaction, then based on whether it is significant (generally \(\alpha<0.05\)). If the interaction term is significant then the two main effects are not useful, so just report the interaction term then do a pairwise analysis of each of the groups.

For a more detailed discussion of interpreting interactions see Nieuwenhuis, Forstmann, and Wagenmakers (2011)

Looking at Individual Factor Contributions

The omnibus ANOVA analysis just says there is some difference between groups but does not specify how the groups differ. This can be done using the lm command.

Show the code
lm(size~color, data=balloon.data) %>% 
  tidy %>% 
  kable(caption="2x2 linear model testing for effects of color and shape on size.  This does not include the reference value for color (blue), which instead is included in the intercept")
2x2 linear model testing for effects of color and shape on size. This does not include the reference value for color (blue), which instead is included in the intercept
term estimate std.error statistic p.value
(Intercept) 2.1695956 1.506998 1.4396802 0.1646993
colorgreen 0.6426212 2.131217 0.3015278 0.7659777
colorred 1.7360938 1.845688 0.9406213 0.3575939
Show the code
lm(size~color+0, data=balloon.data) %>% 
  tidy %>% 
  kable(caption="2x2 linear model testing for effects of color and shape on size including the reference value")
2x2 linear model testing for effects of color and shape on size including the reference value
term estimate std.error statistic p.value
colorblue 2.169596 1.506998 1.439680 0.1646993
colorgreen 2.812217 1.506998 1.866105 0.0760561
colorred 3.905689 1.065609 3.665219 0.0014424

Assumptions of ANOVA Analyses

Although when group sizes are equal, ANOVA is quite robust to non-normality and non-equal variance, you should test these assumptions.

Testing Normality

Check Distribution of Your Data

One way is to test the normality of each group of your data. First it is best to do this visually, for example with a density plot:

Show the code
library(ggplot2)

ggplot(balloon.data,aes(x=size,fill=color)) +
  geom_density(alpha=0.5) +
  scale_fill_manual(values=c('blue','green','red'))

Visually you are looking for a nice bell curve (such as blue but not red).

Another useful plot is what is known as a qqplot

Show the code
ggplot(balloon.data, aes(sample = size,color=color)) +
  stat_qq() +
  stat_qq_line() +
  scale_color_manual(values=c('blue','green','red'))

A normal distribution (such as blue) would have the points near the line, but a non-normal distribution would have values that dont fit the line well (green or red).

Statistical Methods to Test for Normality

Show the code
balloon.data %>%
  group_by(color) %>%
  summarize(Shapiro.Wilk.Test = shapiro.test(size)$p.value) %>% #extracts the p-value from the shapiro wilk tests 
  kable(caption="Shapiro-Wilk tests for normality when grouped by color")
Shapiro-Wilk tests for normality when grouped by color
color Shapiro.Wilk.Test
blue 0.8797698
green 0.0311985
red 0.0096922
Show the code
balloon.data %>%
  group_by(color,shape) %>% #now with color and shape
  summarize(Shapiro.Wilk.Test = shapiro.test(size)$p.value) %>% #extracts the p-value from the shapiro wilk tests 
  kable(caption="Shapiro-Wilk tests for normality when grouped by color and shape")
`summarise()` has grouped output by 'color'. You can override using the
`.groups` argument.
Shapiro-Wilk tests for normality when grouped by color and shape
color shape Shapiro.Wilk.Test
blue circle 0.5437592
blue square 0.5474969
green circle 0.1448867
green square 0.2593481
red circle 0.0163393
red square 0.1048015

We interpret this that if \(p_{shapiro-wilk}<0.05\) then the data are not normally distributed, which is the case here. If the data is not normal, you could transform your dependent variable (ie look at log(size) rather than size). You could also use a non-parametric test such as a Kruskal-Wallis test (for groups) or Mann-Whitney test (for pairs).

The second way to test for normality is to evaluate the residuals of your model for normality. The residuals are the difference between the model-predicted value and the measured value. These should be normally ditributed.

Show the code
aov(size~color, data=balloon.data) %>% 
  residuals %>% 
  shapiro.test %>%
  tidy %>%
  kable(caption="Shapiro-Wilk test for the residuals of the model.")
Shapiro-Wilk test for the residuals of the model.
statistic p.value method
0.8707586 0.005444 Shapiro-Wilk normality test

Post-Hoc Tests

If you get a significant result from the omnibus ANOVA, you often will want to do further post-hoc testing to look at pairwise comparisons. As a general rule, we test the omnibus test first prior to proceeding to pairwise analyses. There are three main ways to do this:

  • Pairwise t-Tests (remember to check the assumptions of normality and equal variance when choosing a test)
  • Tukey’s Honest Significant Difference – Compares all combinations
  • Dunnett’s Test – Compares all values to a control
Show the code
#save an anova object
balloon.aov <- aov(size~color,  data=balloon.data)
#can run a Tukey's test on this
TukeyHSD(balloon.aov) %>% 
  tidy %>%
  kable(caption="One method for performing a Tukey's HSD test.")
One method for performing a Tukey’s HSD test.
term contrast null.value estimate conf.low conf.high adj.p.value
color green-blue 0 0.6426212 -4.729262 6.014504 0.9512327
color red-blue 0 1.7360938 -2.916094 6.388281 0.6213220
color red-green 0 1.0934726 -3.558715 5.745660 0.8256367
Show the code
#this is the same as:
library(multcomp)
Loading required package: mvtnorm
Loading required package: survival
Loading required package: TH.data
Loading required package: MASS

Attaching package: 'MASS'
The following object is masked from 'package:dplyr':

    select

Attaching package: 'TH.data'
The following object is masked from 'package:MASS':

    geyser
Show the code
glht(balloon.aov, linfct=mcp(color="Tukey")) %>% 
  tidy %>% 
  kable(caption="Tukey's HSD tests for baloon color, using glht")
Tukey’s HSD tests for baloon color, using glht
term contrast null.value estimate std.error statistic adj.p.value
color green - blue 0 0.6426212 2.131217 0.3015278 0.9509435
color red - blue 0 1.7360938 1.845688 0.9406213 0.6198169
color red - green 0 1.0934726 1.845688 0.5924470 0.8247665
Show the code
#Can also use glht for Dunnett's tests
#define the control group to what you want to compare to
balloon.data$color <- relevel(balloon.data$color, ref="blue")
#run a dunnett's test
glht(balloon.aov, linfct=mcp(color="Dunnett")) %>% 
  tidy %>% 
  kable(caption="Dunnett's test comparing to blue")
Dunnett’s test comparing to blue
term contrast null.value estimate std.error statistic adj.p.value
color green - blue 0 0.6426212 2.131217 0.3015278 0.9324820
color red - blue 0 1.7360938 1.845688 0.9406213 0.5394621
Show the code
#default is single-step but can manually set FDR adjustment methods
glht(balloon.aov, linfct=mcp(color="Dunnett")) %>% 
  summary(adjusted(type="BH")) %>% 
  tidy %>% 
  kable(caption="Benjamini-Hochberg adjusted dunnett's test for effect of color on size")
Benjamini-Hochberg adjusted dunnett’s test for effect of color on size
term contrast null.value estimate std.error statistic adj.p.value
color green - blue 0 0.6426212 2.131217 0.3015278 0.7659777
color red - blue 0 1.7360938 1.845688 0.9406213 0.7151879
Show the code
glht(balloon.aov, linfct=mcp(color="Dunnett")) %>% 
  summary(adjusted(type="bonferroni")) %>% 
  tidy %>%
  kable(caption="Bonferroni adjusted dunnett's test for effect of color on size")
Bonferroni adjusted dunnett’s test for effect of color on size
term contrast null.value estimate std.error statistic adj.p.value
color green - blue 0 0.6426212 2.131217 0.3015278 1.0000000
color red - blue 0 1.7360938 1.845688 0.9406213 0.7151879

Pairwise t-tests

As with glht your post-hoc tests may benefit from adjusting for multiple hypotheses. Two common methods are Bonferroni (reference is Dunn (1961)) or Benjamini and Hochberg (1995).

Show the code
pairwise.t.test(balloon.data$size, balloon.data$color) %>% tidy %>% kable(caption="Pairwise tests usint color and size")
Pairwise tests usint color and size
group1 group2 p.value
green blue 1
red blue 1
red green 1
Show the code
#default is Holm but can set to whatever you want or none
pairwise.t.test(balloon.data$size, balloon.data$color, p.adjust.method="BH")%>% tidy %>% kable(caption="Pairwise tests usint color and size, p-values adjusted by BH methods")
Pairwise tests usint color and size, p-values adjusted by BH methods
group1 group2 p.value
green blue 0.7659777
red blue 0.7659777
red green 0.7659777
Show the code
pairwise.t.test(balloon.data$size, balloon.data$color, p.adjust.method="none") %>% tidy %>% kable(caption="Pairwise tests usint color and size, no p-value adjustments")
Pairwise tests usint color and size, no p-value adjustments
group1 group2 p.value
green blue 0.7659777
red blue 0.3575939
red green 0.5598772

Session Information

R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Detroit
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] multcomp_1.4-26 TH.data_1.1-2   MASS_7.3-61     survival_3.7-0 
[5] mvtnorm_1.3-1   ggplot2_3.5.1   knitr_1.48      dplyr_1.1.4    
[9] broom_1.0.6    

loaded via a namespace (and not attached):
 [1] Matrix_1.7-0      gtable_0.3.5      jsonlite_1.8.8    compiler_4.4.1   
 [5] tidyselect_1.2.1  stringr_1.5.1     tidyr_1.3.1       splines_4.4.1    
 [9] scales_1.3.0      yaml_2.3.10       fastmap_1.2.0     lattice_0.22-6   
[13] R6_2.5.1          labeling_0.4.3    generics_0.1.3    htmlwidgets_1.6.4
[17] backports_1.5.0   tibble_3.2.1      munsell_0.5.1     pillar_1.9.0     
[21] rlang_1.1.4       utf8_1.2.4        stringi_1.8.4     xfun_0.47        
[25] cli_3.6.3         withr_3.0.1       magrittr_2.0.3    digest_0.6.37    
[29] grid_4.4.1        rstudioapi_0.16.0 sandwich_3.1-1    lifecycle_1.0.4  
[33] vctrs_0.6.5       evaluate_0.24.0   glue_1.7.0        farver_2.1.2     
[37] codetools_0.2-20  zoo_1.8-12        fansi_1.0.6       colorspace_2.1-1 
[41] rmarkdown_2.28    purrr_1.0.2       tools_4.4.1       pkgconfig_2.0.3  
[45] htmltools_0.5.8.1

References

Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B 57 (1): 289–300.
Dunn, Olive Jean. 1961. “Multiple Comparisons Among Means.” Journal of the American Statistical Association 56 (293): 52–64.
Nieuwenhuis, Sander, Birte U. Forstmann, and Eric-Jan Wagenmakers. 2011. “Erroneous Analyses of Interactions in Neuroscience: A Problem of Significance.” Nature Neuroscience 14 (9): 1105–9. https://doi.org/10.1038/nn.2886.