Comparing Means

In the last section, we learned how to compare two means using t-tests.
- Example: Is the average height of Group A different from Group B?
But what if we have more than two groups?
- Example: Comparing the effectiveness of three different drugs (Placebo, Low Dose, High Dose).
We can’t just run multiple t-tests.
- Why? Because it increases the chance of making a mistake (Type I Error).

Why not just do lots of t-Tests?

Imagine you have 3 groups. To compare them all, you’d need 3 t-tests:
- Group 1 vs Group 2
- Group 1 vs Group 3
- Group 2 vs Group 3
The Problem: Every time you run a test, there is a 5% chance (\(\alpha = 0.05\)) of finding a difference that doesn’t actually exist (Type I Error).
If you run multiple tests on the same data, these risks add up.
- This is called the Familywise Error Rate.
- For 3 groups (3 tests), the risk isn’t 5%, it’s actually about 14%!
- For 5 groups (10 tests), the risk jumps to 40%!

Real World Analogy: The H1B Lottery

Think of it like entering a lottery (e.g., H1B Visa).
If your chance of losing is 75% in one try…
If you try 3 times, your chance of winning at least once goes up significantly (to ~58%).
In science, “winning” a test by mistake is bad (a False Positive).
The more tests we run, the more likely we are to make a mistake.
ANOVA solves this by running just one test for everyone.

Enter ANOVA (Analysis of Variance)

ANOVA is the solution. It compares several means at the same time while keeping the error rate at 5%.
It is an Omnibus Test:
- It tests for an overall difference between groups.
- It tells us: “Hey, there is a difference somewhere among these groups.”
- It does not tell us specifically which groups are different (e.g., Group 1 vs Group 3).
Key Concept: ANOVA breaks down the total variation in the data into two parts:
1. Systematic Variance: Variance caused by our experiment (the “Signal”).
2. Unsystematic Variance: Variance caused by random factors (the “Noise”).

The Logic of the F-Ratio

ANOVA calculates a statistic called the F-ratio.
\[ F = \frac{\text{Model (Systematic Variance)}}{\text{Residual (Unsystematic Variance)}} \]
\[ F = \frac{\text{Good Variance}}{\text{Bad Variance}} \]
Logic:
- If the experiment worked, the “Good Variance” (differences caused by your groups) should be much larger than the “Bad Variance” (random noise).
- If \(F > 1\), the experimental effect is bigger than noise.
- The larger the F, the more likely the difference is real.

ANOVA as Regression

Surprise! ANOVA is actually a special type of Linear Regression.
In regression, we predict an outcome (\(Y\)) based on predictors (\(X\)).
In ANOVA:
- Outcome (\(Y\)) = Continuous variable (e.g., Libido).
- Predictor (\(X\)) = Categorical variable (e.g., Drug Dose).
We are essentially asking: “Does knowing which group a person is in help us predict their score better than just guessing the average?”

The Example: Viagra and Libido

Let’s look at an example from Andy Field’s book.
Research Question: Does Viagra affect libido?
Groups (Independent Variable):
1. Placebo (Sugar Pill)
2. Low Dose Viagra
3. High Dose Viagra
Outcome (Dependent Variable): Libido Score (Objective measure).

The Data

library(rio)
master <- import("data/viagra.sav")
str(master)

## 'data.frame':    15 obs. of  3 variables:
##  $ person: num  1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "label")= chr "Participant"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ dose  : num  1 1 1 1 1 2 2 2 2 2 ...
##   ..- attr(*, "label")= chr "Dose of Viagra"
##   ..- attr(*, "format.spss")= chr "F8.0"
##   ..- attr(*, "labels")= Named num [1:3] 1 2 3
##   .. ..- attr(*, "names")= chr [1:3] "Placebo" "Low Dose" "High Dose"
##  $ libido: num  3 2 1 1 4 5 2 4 2 3 ...
##   ..- attr(*, "label")= chr "Libido"
##   ..- attr(*, "format.spss")= chr "F8.0"

master$dose <- factor(master$dose, 
                      levels = c(1,2,3),
                      labels = c("Placebo", "Low Dose", "High Dose"))

Step 1: Total Sum of Squares (\(SS_T\))

First, we calculate the Total Variation in the data.
This is how much the data points differ from the Grand Mean (the average of everyone).
\[ SS_T = \text{Total "Stuff" to be explained} \]

Degrees of Freedom (\(df_T\)) = \(N - 1 = 15 - 1 = 14\).

Step 2: Model Sum of Squares (\(SS_M\))

This is the Good Variance.
How much do the Group Means differ from the Grand Mean?
If the groups are very different, this number will be big.
This represents the effect of our experiment (the drug).

Degrees of Freedom (\(df_M\)) = \(k - 1\) (where \(k\) is number of groups) = \(3 - 1 = 2\).

Step 3: Residual Sum of Squares (\(SS_R\))

This is the Bad Variance (Error).
How much do individual scores differ from their own Group Mean?
This is variation we cannot explain with our experiment (individual differences).

Calculating the F-Ratio

Mean Squares (MS): We divide the Sum of Squares (SS) by their degrees of freedom (df) to get the “average” variance.
- \(MS_M = SS_M / df_M\)
- \(MS_R = SS_R / df_R\)
F-Ratio:
- \[ F = \frac{MS_M}{MS_R} \]

Assumptions of ANOVA

Before running ANOVA, we must check if our data is suitable:

Normality: Data should be roughly normally distributed within groups.
Homogeneity of Variance: The spread (variance) of scores should be roughly the same in each group.
- We test this with Levene’s Test.
- We want Levene’s test to be non-significant (\(p > .05\)), meaning variances are equal.
Independence: Scores are independent (one person’s score doesn’t influence another’s).

Running ANOVA in R (`ezANOVA`)

We use the ez package, which makes ANOVA easy.
Important: You need a participant ID column (wid).

library(ez)

## Create a participant ID if you don't have one
master$partno <- 1:nrow(master)
options(scipen = 20)

ezANOVA(data = master,
        dv = libido,
        between = dose,
        wid = partno,
        type = 3, 
        detailed = T)

## Coefficient covariances computed by hccm()

## $ANOVA
##        Effect DFn DFd       SSn  SSd         F               p p<.05       ges
## 1 (Intercept)   1  12 180.26667 23.6 91.661017 0.0000005720565     * 0.8842381
## 2        dose   2  12  20.13333 23.6  5.118644 0.0246942895382     * 0.4603659
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd       SSn SSd         F         p p<.05
## 1   2  12 0.1333333 6.8 0.1176471 0.8900225

Interpreting the Output

Levene’s Test:

ezANOVA(data = master, dv = libido, between = dose, wid = partno, type = 3, detailed = T)$`Levene's Test for Homogeneity of Variance`

##   DFn DFd       SSn SSd         F         p p<.05
## 1   2  12 0.1333333 6.8 0.1176471 0.8900225

\(p = .89\) (greater than .05). Good! Assumptions met.

ANOVA Results:

##        Effect DFn DFd       SSn  SSd         F               p p<.05       ges
## 1 (Intercept)   1  12 180.26667 23.6 91.661017 0.0000005720565     * 0.8842381
## 2        dose   2  12  20.13333 23.6  5.118644 0.0246942895382     * 0.4603659

\(F(2, 12) = 5.12\), \(p = .025\).
Conclusion: Since \(p < .05\), there is a significant difference in libido between the groups. The drug had an effect!

Post Hoc Tests: Where is the difference?

The ANOVA told us something happened, but not what.
- Is High Dose > Placebo?
- Is Low Dose > Placebo?
- Is High Dose > Low Dose?
We use Post Hoc Tests (like “after the fact” tests) to compare pairs of groups.
We must correct for the “Familywise Error” we talked about earlier.
Bonferroni Correction: A strict correction that adjusts the p-value to be safe.

pairwise.t.test(master$libido,
                master$dose,
                p.adjust.method = "bonferroni", 
                paired = F, 
                var.equal = T)

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  master$libido and master$dose 
## 
##           Placebo Low Dose
## Low Dose  0.845   -       
## High Dose 0.025   0.196   
## 
## P value adjustment method: bonferroni

Effect Size: How big is the difference?

Statistical significance (\(p < .05\)) only tells us the effect is likely real.
Effect Size tells us how large or meaningful the effect is.
Common measures:
- \(\eta^2\) (Eta Squared): % of variance explained by the variable.
- \(\omega^2\) (Omega Squared): A less biased estimate (better for small samples).

library(MOTE)
# Calculate Omega Squared
effect <- omega.F(dfm = 2, dfe = 12, Fvalue = 5.12, n = 15, a = .05)
effect$omega

## [1] 0.3545611

\(\omega^2 = 0.35\). This is a Large Effect! (Small = .01, Medium = .06, Large = .14).

Trend Analysis

If our groups have a logical order (Placebo < Low < High), we can check for trends.
Linear Trend: Does libido go up as dose goes up?
Quadratic Trend: Does it go up then down (curved)?

master$dose2 <- master$dose
contrasts(master$dose2) <- contr.poly(3) 
output <- aov(libido ~ dose2, data = master)
summary.lm(output)

## 
## Call:
## aov(formula = libido ~ dose2, data = master)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -2.0   -1.2   -0.2    0.9    2.0 
## 
## Coefficients:
##             Estimate Std. Error t value    Pr(>|t|)    
## (Intercept)   3.4667     0.3621   9.574 0.000000572 ***
## dose2.L       1.9799     0.6272   3.157     0.00827 ** 
## dose2.Q       0.3266     0.6272   0.521     0.61201    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.402 on 12 degrees of freedom
## Multiple R-squared:  0.4604, Adjusted R-squared:  0.3704 
## F-statistic: 5.119 on 2 and 12 DF,  p-value: 0.02469

Linear Trend is significant (\(p < .01\)). As Viagra dose increases, libido increases.

Visualization

Always plot your data!

library(ggplot2)
cleanup <- theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), 
                panel.background = element_blank(), axis.line = element_line(color = "black"))

ggplot(master, aes(dose, libido)) +
  cleanup +
  stat_summary(fun.y = mean, geom = "bar", fill = "White", color = "Black") +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = .2) +
  xlab("Dosage") + ylab("Libido")

Summary

ANOVA compares 3+ means while controlling error rates.
F-Ratio = Good Variance / Bad Variance.
Assumptions: Normality & Homogeneity (Levene’s Test).
Post Hoc: Use Bonferroni to find specific group differences.
Effect Size: Use Omega Squared (\(\omega^2\)) to see how important the result is.

ANOVA: Analysis of Variance

Comparing Means

Why not just do lots of t-Tests?

Real World Analogy: The H1B Lottery

Enter ANOVA (Analysis of Variance)

The Logic of the F-Ratio

ANOVA as Regression

The Example: Viagra and Libido

The Data

Step 1: Total Sum of Squares (\(SS_T\))

Step 2: Model Sum of Squares (\(SS_M\))

Step 3: Residual Sum of Squares (\(SS_R\))

Calculating the F-Ratio

Assumptions of ANOVA

Running ANOVA in R (ezANOVA)

Interpreting the Output

Post Hoc Tests: Where is the difference?

Effect Size: How big is the difference?

Trend Analysis

Visualization

Summary

Running ANOVA in R (`ezANOVA`)