ANOVA: Analysis of Variance

Ziyuan Huang

Last Updated: 2026-02-12

Comparing Means

Why not just do lots of t-Tests?

Real World Analogy: The H1B Lottery

Enter ANOVA (Analysis of Variance)

The Logic of the F-Ratio

ANOVA as Regression

The Example: Viagra and Libido

The Data

library(rio)
master <- import("data/viagra.sav")
str(master)
## 'data.frame':    15 obs. of  3 variables:
##  $ person: num  1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "label")= chr "Participant"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ dose  : num  1 1 1 1 1 2 2 2 2 2 ...
##   ..- attr(*, "label")= chr "Dose of Viagra"
##   ..- attr(*, "format.spss")= chr "F8.0"
##   ..- attr(*, "labels")= Named num [1:3] 1 2 3
##   .. ..- attr(*, "names")= chr [1:3] "Placebo" "Low Dose" "High Dose"
##  $ libido: num  3 2 1 1 4 5 2 4 2 3 ...
##   ..- attr(*, "label")= chr "Libido"
##   ..- attr(*, "format.spss")= chr "F8.0"
master$dose <- factor(master$dose, 
                      levels = c(1,2,3),
                      labels = c("Placebo", "Low Dose", "High Dose"))

Step 1: Total Sum of Squares (\(SS_T\))

Step 2: Model Sum of Squares (\(SS_M\))

Step 3: Residual Sum of Squares (\(SS_R\))

Calculating the F-Ratio

  1. Mean Squares (MS): We divide the Sum of Squares (SS) by their degrees of freedom (df) to get the “average” variance.
    • \(MS_M = SS_M / df_M\)
    • \(MS_R = SS_R / df_R\)
  2. F-Ratio:
    • \[ F = \frac{MS_M}{MS_R} \]

Assumptions of ANOVA

Before running ANOVA, we must check if our data is suitable:

  1. Normality: Data should be roughly normally distributed within groups.
  2. Homogeneity of Variance: The spread (variance) of scores should be roughly the same in each group.
    • We test this with Levene’s Test.
    • We want Levene’s test to be non-significant (\(p > .05\)), meaning variances are equal.
  3. Independence: Scores are independent (one person’s score doesn’t influence another’s).

Running ANOVA in R (ezANOVA)

library(ez)

## Create a participant ID if you don't have one
master$partno <- 1:nrow(master)
options(scipen = 20)

ezANOVA(data = master,
        dv = libido,
        between = dose,
        wid = partno,
        type = 3, 
        detailed = T)
## Coefficient covariances computed by hccm()
## $ANOVA
##        Effect DFn DFd       SSn  SSd         F               p p<.05       ges
## 1 (Intercept)   1  12 180.26667 23.6 91.661017 0.0000005720565     * 0.8842381
## 2        dose   2  12  20.13333 23.6  5.118644 0.0246942895382     * 0.4603659
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd       SSn SSd         F         p p<.05
## 1   2  12 0.1333333 6.8 0.1176471 0.8900225

Interpreting the Output

  1. Levene’s Test:

    ezANOVA(data = master, dv = libido, between = dose, wid = partno, type = 3, detailed = T)$`Levene's Test for Homogeneity of Variance`
    ##   DFn DFd       SSn SSd         F         p p<.05
    ## 1   2  12 0.1333333 6.8 0.1176471 0.8900225
    • \(p = .89\) (greater than .05). Good! Assumptions met.
  2. ANOVA Results:

    ##        Effect DFn DFd       SSn  SSd         F               p p<.05       ges
    ## 1 (Intercept)   1  12 180.26667 23.6 91.661017 0.0000005720565     * 0.8842381
    ## 2        dose   2  12  20.13333 23.6  5.118644 0.0246942895382     * 0.4603659
    • \(F(2, 12) = 5.12\), \(p = .025\).
    • Conclusion: Since \(p < .05\), there is a significant difference in libido between the groups. The drug had an effect!

Post Hoc Tests: Where is the difference?

pairwise.t.test(master$libido,
                master$dose,
                p.adjust.method = "bonferroni", 
                paired = F, 
                var.equal = T)
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  master$libido and master$dose 
## 
##           Placebo Low Dose
## Low Dose  0.845   -       
## High Dose 0.025   0.196   
## 
## P value adjustment method: bonferroni

Effect Size: How big is the difference?

library(MOTE)
# Calculate Omega Squared
effect <- omega.F(dfm = 2, dfe = 12, Fvalue = 5.12, n = 15, a = .05)
effect$omega
## [1] 0.3545611

Trend Analysis

master$dose2 <- master$dose
contrasts(master$dose2) <- contr.poly(3) 
output <- aov(libido ~ dose2, data = master)
summary.lm(output)
## 
## Call:
## aov(formula = libido ~ dose2, data = master)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -2.0   -1.2   -0.2    0.9    2.0 
## 
## Coefficients:
##             Estimate Std. Error t value    Pr(>|t|)    
## (Intercept)   3.4667     0.3621   9.574 0.000000572 ***
## dose2.L       1.9799     0.6272   3.157     0.00827 ** 
## dose2.Q       0.3266     0.6272   0.521     0.61201    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.402 on 12 degrees of freedom
## Multiple R-squared:  0.4604, Adjusted R-squared:  0.3704 
## F-statistic: 5.119 on 2 and 12 DF,  p-value: 0.02469

Visualization

library(ggplot2)
cleanup <- theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), 
                panel.background = element_blank(), axis.line = element_line(color = "black"))

ggplot(master, aes(dose, libido)) +
  cleanup +
  stat_summary(fun.y = mean, geom = "bar", fill = "White", color = "Black") +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = .2) +
  xlab("Dosage") + ylab("Libido")

Summary