Machine: MELAW - Windows 10 x64

Random Seed Set To: 4591


Introduction

Welcome to Week 12! Previously, we learned how to compare two groups using t-tests. But what happens when we have three, four, or ten groups?

If we simply ran multiple t-tests for every pair of groups, we would run into the Familywise Error Rate (FWER) problem. Every test carries a 5% risk of a False Positive (Type I Error, \(\alpha = 0.05\)).

Mathematically, the probability of not making a Type I error in a single test is \(1 - 0.05 = 0.95\). If we run \(m\) independent tests, the probability of not making any error across all of them is \((0.95)^m\).

Therefore, the probability of making at least one error (the Familywise Error Rate) is:

\[ P(\text{Familywise Error}) = 1 - (0.95)^m \]

If you run 3 tests (\(m=3\)), your risk isn’t 5% anymore—it’s nearly 14%:

\[ 1 - (0.95)^3 = 1 - 0.857 \approx 0.143 \quad (14.3\%) \]

Real World Analogy: The H1B Lottery

Let’s look at a real-world example of how probabilities accumulate: The H1B Visa Lottery.

In 2024 (for the FY 2025 cap), the selection rate was approximately 25.6%. This means in any single year, you have a roughly 74.4% chance of not getting selected.

If you apply 3 times (e.g., over your 3-year OPT period), what is the chance you get selected at least once? It is not simply \(25.6\% \times 3 = 76.8\%\).

Using the same probability logic as above:

  1. Probability of failing 1 time: \(1 - 0.256 = 0.744\)
  2. Probability of failing 3 times in a row: \((0.744)^3 \approx 0.412\)
  3. Probability of succeeding at least once: \(1 - 0.412 = 0.588\)

So, even with 3 attempts, your overall success rate is about 58.8%.

This is the inverse of the FWER problem. In FWER, we want to avoid making an error (a “false success”), but the more tests we run, the more likely we are to make one. In the lottery, we want a success, and the more tries we make, the more likely we are to get one—but it never reaches 100%.

ANOVA (Analysis of Variance) solves the FWER problem by comparing all means simultaneously in a single “Omnibus” test. It relies on the F-Ratio [1].

Learning Objectives

  1. Understand the logic of ANOVA through visualization (SST, SSM, SSR).
  2. Conduct a One-Way ANOVA using the ez package.
  3. Check assumptions (Homogeneity of Variance).
  4. Perform Post Hoc tests (Bonferroni) to find specific differences.
  5. Calculate Effect Size (\(\omega^2\)).

Required Packages

We will use rio for data import, ez for simple ANOVA execution, MOTE for effect sizes, and ggplot2 for visualization.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, rio, ez, MOTE, ggplot2, gridExtra)

Part 1: The Example (Viagra and Libido)

We will use a classic dataset from Andy Field’s textbook [1]. The research question is simple: Does Viagra affect libido?

  • Independent Variable (IV): Dose (Placebo, Low Dose, High Dose)
  • Dependent Variable (DV): Libido Score

Loading and Cleaning Data

# Import data
master <- import("data/viagra.sav")

# Convert 'dose' to a factor with labels
master$dose <- factor(master$dose, 
                      levels = c(1, 2, 3),
                      labels = c("Placebo", "Low Dose", "High Dose"))

# Create a participant ID (required for ezANOVA)
master$partno <- 1:nrow(master)

# Calculate Grand Mean and Group Means for visualization
grand_mean <- mean(master$libido)
group_means <- master %>%
  group_by(dose) %>%
  summarize(mean_libido = mean(libido))

# Merge group means back into master for plotting
master <- master %>%
  left_join(group_means, by = "dose")

head(master)

Part 2: The Logic of ANOVA (Visualized)

To understand ANOVA, we need to understand Variance. Variance is just the average distance of data points from the mean. ANOVA splits this variance into two parts: “Good Variance” (explained by our experiment) and “Bad Variance” (unexplained noise).

We calculate three types of “Sum of Squares” (SS) to measure this [1].

1. Total Sum of Squares (SST) - The “Total Variation”

First, we ignore the groups. We just ask: “How spread out is the data overall?” We measure the distance from Every Data Point to the Grand Mean (the average of everyone).

\[ SST = \sum (x_i - \bar{x}_{grand})^2 \]

2. Model Sum of Squares (SSM) - The “Good Variance”

Now we use our model (the groups). If our experiment worked, the groups should be different. We measure the distance from the Group Means to the Grand Mean. This represents the Signal: How much better is our model (using groups) than just guessing the average?

\[ SSM = \sum n_k (\bar{x}_{k} - \bar{x}_{grand})^2 \]

3. Residual Sum of Squares (SSR) - The “Bad Variance”

Finally, we look at what we couldn’t explain. Even within the “High Dose” group, not everyone has the same libido. We measure the distance from Every Data Point to its Group Mean. This represents the Noise or Error.

\[ SSR = \sum (x_{ik} - \bar{x}_{k})^2 \]

The F-Ratio Calculation

The F-Ratio is simply the ratio of the “Good Variance” (Signal) to the “Bad Variance” (Noise) [1].

\[ F = \frac{\text{Average SSM (Signal)}}{\text{Average SSR (Noise)}} \]

If \(F > 1\), it means the experimental effect is bigger than random noise.


Part 3: Running ANOVA

We use the ezANOVA function from the ez package. It is popular because it automatically provides assumption checks and formats the output nicely.

Key Arguments: * dv: Dependent Variable (Outcome) * between: Between-subjects predictor (Group) * wid: Within-subject ID (Participant ID) * type: Sum of Squares type (Type 3 is standard for social sciences)

options(scipen = 20) # Turn off scientific notation

anova_model <- ezANOVA(data = master,
                       dv = libido,
                       between = dose,
                       wid = partno,
                       type = 3, 
                       detailed = TRUE)

print(anova_model)
## $ANOVA
##        Effect DFn DFd       SSn  SSd         F               p p<.05       ges
## 1 (Intercept)   1  12 180.26667 23.6 91.661017 0.0000005720565     * 0.8842381
## 2        dose   2  12  20.13333 23.6  5.118644 0.0246942895382     * 0.4603659
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd       SSn SSd         F         p p<.05
## 1   2  12 0.1333333 6.8 0.1176471 0.8900225

Part 4: Interpreting Results

1. Assumption Check: Levene’s Test

Before looking at the main result, we must check if the groups have equal variances (Homogeneity of Variance).

levene_results <- anova_model$`Levene's Test for Homogeneity of Variance`
print(levene_results)
##   DFn DFd       SSn SSd         F         p p<.05
## 1   2  12 0.1333333 6.8 0.1176471 0.8900225
levene_p <- levene_results$p
  • Result: \(p = 0.89\).
  • Interpretation: Since \(p > .05\), the variances are not significantly different. We have met the assumption of homogeneity.

2. The F-Test (Main Effect)

Now we look at the ANOVA table.

anova_table <- anova_model$ANOVA
print(anova_table)
##        Effect DFn DFd       SSn  SSd         F               p p<.05       ges
## 1 (Intercept)   1  12 180.26667 23.6 91.661017 0.0000005720565     * 0.8842381
## 2        dose   2  12  20.13333 23.6  5.118644 0.0246942895382     * 0.4603659
# Extract values for reporting
# ezANOVA detailed=TRUE includes Intercept, so we filter for "dose"
dose_row <- anova_table[anova_table$Effect == "dose", ]

df_model <- as.numeric(dose_row$DFn)
df_error <- as.numeric(dose_row$DFd)
f_value <- as.numeric(dose_row$F)
p_value <- as.numeric(dose_row$p)
  • Result: \(F(2, 12) = 5.12, p = 0.025\).
  • Interpretation: Since \(p < .05\), there is a significant difference in libido somewhere among the three groups. The model (signal) explains significantly more variance than the residuals (noise).

Part 5: Post Hoc Tests

The ANOVA is an “Omnibus” test—it tells us that there is a difference, but not where. To find out which specific groups differ (e.g., High Dose vs. Placebo), we run Post Hoc Tests.

We use the Bonferroni correction to adjust our p-values and keep the Familywise Error Rate at 5%.

posthoc <- pairwise.t.test(master$libido,
                         master$dose,
                         p.adjust.method = "bonferroni", 
                         paired = FALSE, 
                         var.equal = TRUE)
print(posthoc)
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  master$libido and master$dose 
## 
##           Placebo Low Dose
## Low Dose  0.845   -       
## High Dose 0.025   0.196   
## 
## P value adjustment method: bonferroni
# Extract p-values (The matrix has rows: Low Dose, High Dose; cols: Placebo, Low Dose)
p_high_vs_placebo <- posthoc$p.value["High Dose", "Placebo"]
p_low_vs_placebo <- posthoc$p.value["Low Dose", "Placebo"]
p_high_vs_low <- posthoc$p.value["High Dose", "Low Dose"]

Interpretation: * High Dose vs. Placebo: \(p = 0.025\) (Significant!) * Low Dose vs. Placebo: \(p = 0.845\) (Not significant) * High Dose vs. Low Dose: \(p = 0.196\) (Not significant)

Conclusion: High Dose Viagra significantly increases libido compared to Placebo, but Low Dose does not.


Part 6: Effect Size

Statistical significance (\(p < .05\)) depends on sample size. To know how large the effect is, we calculate Omega Squared (\(\omega^2\)).

# Calculate Omega Squared using dynamic values extracted earlier
# df_model = `r df_model`
# df_error = `r df_error`
# f_value = `r f_value`
effect <- omega.F(dfm = df_model, dfe = df_error, Fvalue = f_value, n = nrow(master)/3, a = .05)

omega_sq <- effect$omega
  • Result: \(\omega^2 = 0.62\).
  • Interpretation: This is a Large Effect (Rule of thumb: .01 = Small, .06 = Medium, .14 = Large). The drug explains about 62% of the variance in libido.

Part 7: Trend Analysis

When our groups have a natural order (e.g., Placebo < Low Dose < High Dose), we can ask a more specific question than just “Are they different?” We can ask: “What is the shape of the relationship?”

This is called Trend Analysis. We typically test for two main shapes:

  1. Linear Trend (L): A straight line.
    • Logic: As the dose goes up, libido goes up (or down) consistently.
    • Analogy: Like climbing a ramp.
    • Hypothesis: “More drug = More effect.”
  2. Quadratic Trend (Q): A curve (U-shape or Inverted-U).
    • Logic: The effect goes up for a while, but then might level off or even go down at very high doses.
    • Analogy: Like throwing a ball—it goes up, peaks, and comes down.
    • Hypothesis: “There is a ‘sweet spot’ dose, and too much might be bad (or just not better).”
# Create a copy for trend analysis
master$dose_trend <- master$dose

# Set polynomial contrasts (Linear and Quadratic)
contrasts(master$dose_trend) <- contr.poly(3) 

# Run Analysis of Variance (aov is base R function)
trend_model <- aov(libido ~ dose_trend, data = master)
trend_summary <- summary.lm(trend_model)
trend_summary
## 
## Call:
## aov(formula = libido ~ dose_trend, data = master)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -2.0   -1.2   -0.2    0.9    2.0 
## 
## Coefficients:
##              Estimate Std. Error t value    Pr(>|t|)    
## (Intercept)    3.4667     0.3621   9.574 0.000000572 ***
## dose_trend.L   1.9799     0.6272   3.157     0.00827 ** 
## dose_trend.Q   0.3266     0.6272   0.521     0.61201    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.402 on 12 degrees of freedom
## Multiple R-squared:  0.4604, Adjusted R-squared:  0.3704 
## F-statistic: 5.119 on 2 and 12 DF,  p-value: 0.02469
# Extract P-values for dynamic reporting
p_linear <- trend_summary$coefficients["dose_trend.L", "Pr(>|t|)"]
p_quad <- trend_summary$coefficients["dose_trend.Q", "Pr(>|t|)"]

Interpretation of Results

  • Linear Trend (L): Significant (\(p = 0.008\)).
    • Meaning: Since \(p < .05\), we are confident that there is a straight-line relationship.
    • In Plain English: As we increase the dose of Viagra from Placebo to Low to High, libido scores consistently increase. The line on a graph would go straight up.
  • Quadratic Trend (Q): Not Significant (\(p = 0.612\)).
    • Meaning: Since \(p > .05\), there is no significant curve.
    • In Plain English: The data does not go up and then down (or down and then up). It’s just a steady increase.

Why does this matter? If the Quadratic Trend was significant, it would tell us something important: maybe “High Dose” isn’t actually better than “Low Dose” (maybe it causes side effects that lower libido). But here, the Linear Trend wins: More is Better (statistically speaking, for this dataset!).


Part 8: Final Visualization

Finally, we will create a publication-quality visualization. Instead of just showing the means (bars), we should show the distribution of the data.

  • Boxplots: Show the median and spread (quartiles).
  • Jittered Points: Show the actual individual participants (raw data).
  • Diamonds: Show the Group Means.
ggplot(master, aes(x = dose, y = libido, fill = dose)) +
  # 1. Boxplot with transparency (alpha) so we can see points
  geom_boxplot(alpha = 0.4, outlier.shape = NA, width = 0.5) +
  
  # 2. Add Jittered Points (Individual Participants)
  # Jitter adds random noise to x-axis so points don't overlap
  geom_jitter(width = 0.1, size = 3, alpha = 0.7) +
  
  # 3. Add Mean Points (White Diamonds)
  stat_summary(fun = mean, geom = "point", shape = 23, size = 4, 
               fill = "white", color = "black", stroke = 1.5) +
  
  # 4. Make it colorful and clean
  scale_fill_viridis_d(option = "plasma", begin = 0.3, end = 0.9) +
  theme_minimal() +
  labs(x = "Dosage Level", 
       y = "Libido Score", 
       title = "Impact of Viagra on Libido",
       subtitle = "Comparison of Placebo, Low Dose, and High Dose groups",
       caption = "White diamonds = Group Means; Dots = Individual Participants") +
  theme(legend.position = "none",
        plot.title = element_text(size = 16, face = "bold"),
        axis.title = element_text(size = 12))

Interpretation (Plain English)

This plot tells the whole story of our analysis:

  1. The Trend: Look at the White Diamonds (the means). They clearly go up as the dose increases. This matches our “Linear Trend” finding.
  2. The Spread: The colorful boxes show where the middle 50% of the data sits. The “High Dose” box is much higher on the Y-axis than the “Placebo” box.
  3. The Difference: We can visually see why the Post Hoc test found a significant difference between High Dose and Placebo—there is almost no overlap between the lowest High Dose scores and the Placebo scores.

Summary

In this lab, we performed a complete One-Way ANOVA workflow:

  1. Explored the data using visualizations to understand Total (SST), Model (SSM), and Residual (SSR) variance.
  2. Ran ANOVA and confirmed the model was significant (\(F(2, 12) = 5.12\)).
  3. Checked Assumptions (Levene’s Test was non-significant, so assumptions met).
  4. Found Specific Differences using Bonferroni Post Hoc tests (High Dose > Placebo).
  5. Calculated Effect Size (\(\omega^2 = 0.62\), large effect).

References

[1] A. Field, J. Miles, and Z. Field, “Comparing Several Means: ANOVA (GLM 1),” in Discovering Statistics Using R. London: SAGE Publications, 2012, ch. 10.